Pure and Applied Mathematics: A Wiley Series of Texts, Monographs, and Tracts 


Kenneth Shiskowski and Karl Frinkle 


10 
10 0 


10 \ j 


Contents 

Preface 

Conventions and Notations 

Chapter 1: An Introduction to Mathematica 

1.1 The Very Basics 

1.2 Basic Arithmetic 

1.3 Lists and Matrices 

1.4 Expressions versus Functions 

1.5 Plotting and Animations 

1.6 Solving Systems of Equations 

1.7 Basic Programming 

Chapter 2: Linear Systems of Equations and Matrices 

2.1 Linear Systems of Equations 

2.2 Augmented Matrix of a Linear System and Row Operations 
2.3 Some Matrix Arithmetic 

Chapter 3: Gauss—Jordan Elimination and Reduced Row Echelon Form 
3.1 Gauss-Jordan Elimination and rref 

3.2 Elementary Matrices 

3.3 Sensitivity of Solutions to Error in the Linear System 
Chapter 4: Applications of Linear Systems and Matrices 
4.1 Applications of Linear Systems to Geometry 

4.2 Applications of Linear Systems to Curve Fitting 

4.3 Applications of Linear Systems to Economics 


4.4 Applications of Matrix Multiplication to Geometry 


4.5 An Application of Matrix Multiplication to Economics 
Chapter 5: Determinants, Inverses, and Cramer’s Rule 

5.1 Determinants and Inverses from the Adjoint Formula 
5.2 Finding Determinants by Expanding along Any Row or Column 
5.3 Determinants Found by Triangularizing Matrices 

5.4 LU Factorization 

5.5 Inverses from rref 

5.6 Cramer’s Rule 

Chapter 6: Basic Vector Algebra Topics 

6.1 Vectors 

6.2 Dot Product 

6.3 Cross Product 

6.4 Vector Projection 

Chapter 7: A Few Advanced Vector Algebra Topics 

7.1 Rotations in Space 

7.2 “Rolling” a Circle along a Curve 

7.3 The TNB Frame 

Chapter 8: Independence, Basis, and Dimension for Subspaces of R 
8.1 Subspaces of R” 

8.2 Independent and Dependent Sets of Vectors in R 

8.3 Basis and Dimension for Subspaces of R” 

8.4 Vector Projection onto a Subspace of R 

8.5 The Gram-Schmidt Orthonormalization Process 


Chapter 9: Linear Maps from R” to R™ 


9.1 Basics about Linear Maps 

9.2 The Kernel and Image Subspaces of a Linear Map 

9.3 Composites of Two Linear Maps and Inverses 

9.4 Change of Bases for the Matrix Representation of a Linear Map 
Chapter 10: The Geometry of Linear and Affine Maps 


10.1 The Effect of a Linear Map on Area and Arclength in Two 
Dimensions 


10.2 The Decomposition of Linear Maps into Rotations, Reflections, and 
Rescalings in R? 


10.3 The Effect of Linear Maps on Volume, Area, and Arclength in R 
10.4 Rotations, Reflections, and Rescalings in Three Dimensions 

10.5 Affine Maps 

Chapter 11: Least-Squares Fits and Pseudoinverses 


11.1 Pseudoin 
Overdetermined Linear System 


11.2 Fits and Pseudoinverses 
11.3 Least-Squares Fits and Pseudoinverses 


Chapter 12: Eigenvalues and Eigenvectors 


Them 


12.2 Summary of Definitions and Methods for Computing Eigenvalues 
and Eigenvectors as Well as the Exponential of a Matrix 


12.3 Applications of the Diagonalizability of Square Matrices 
12.4 Solving a Square First-Order Linear System of Differential Equations 
12.5 Basic Facts about Eigenvalues, Eigenvectors, and Diagonalizability 


12.6 The Geometry of the Ellipse Using Eigenvalues and Eigenvectors 


12.7 A Mathematica Eigen-Function 
Bibliographic Material 
Indexes 


PURE AND APPLIED MATHEMATICS 
A Wiley Series of Texts, Monographs, and Tracts 
Founded by RICHARD COURANT 


Editors Emeriti: MYRON B. ALLEN III, DAVID A. COX, PETER 
HILTON, HARRY HOCHSTADT, PETER LAX, JOHN TOLAND 


A complete list of the titles in this series appears at the end of this volume. 


Principles of 
Linear Algebra 
With Mathematica’ 


Kenneth Shiskowski 
Department of Mathematics 
Eastern Michigan University 
Ypsilanti, MI 


Karl Frinkle 


Department of Mathematics 
Southeastern Oklahoma State University 
Durant, OK 


FeO me ee ee ee Ra Fs 


Copyright © 2011 by John Wiley & Sons, Inc. All rights reserved. 
Published by John Wiley & Sons, Inc., Hoboken, New Jersey. 
Published simultaneously in Canada. 


No part of this publication may be reproduced, stored in a retrieval 
system, or transmitted in any form or by any means, electronic, 
mechanical, photocopying, recording, scanning, or otherwise, except as 
permitted under Section 107 or 108 of the 1976 United States Copyright 
Act, without either the prior written permission of the Publisher, or 
authorization through payment of the appropriate per-copy fee to the 
Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 
01923, (978) 750-8400, fax (978) 750-4470, or on the web at 
www.copyright.com. Requests to the Publisher for permission should be 
addressed to the Permissions Department, John Wiley & Sons, Inc., 111 
River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or 
online at http://www. wiley.com/go/permission. 


Limit of Liability/Disclaimer of Warranty: While the publisher and author 
have used their best efforts in preparing this book, they make no 
representations or warranties with respect to the accuracy or completeness 
of the contents of this book and specifically disclaim any implied 
warranties of merchantability or fitness for a particular purpose. No 
warranty may be created or extended by sales representatives or written 
sales materials. The advice and strategies contained herein may not be 
suitable for your situation. You should consult with a professional where 
appropriate. Neither the publisher nor author shall be liable for any loss of 
profit or any other commercial damages, including but not limited to 
special, incidental, consequential, or other damages. 


For general information on our other products and services or for technical 
support, please contact our Customer Care Department within the United 
States at (800) 762-2974, outside the United States at (317) 572-3993 or 
fax (317) 572-4002. 


Wiley also publishes its books in a variety of electronic formats. Some 
content that appears in print may not be available in electronic formats. 
For more information about Wiley products, visit our web site at 
www.wiley.com. 


Library of Congress Cataloging-in-Publication Data: 


Shiskowski, Kenneth, 1954— 

Principles of linear algebra with Mathematica / Kenneth Shiskowski, Karl 
Frinkle. 

p. cm. — (Pure and applied mathematics) 

Includes index. 

ISBN 978-0-470-63795-1 (hardback) 

1. Algebras, Linear—Data processing. 2. Mathematica (Computer file) I. 
Frinkle, Karl, 1977- II. Title. 

QA185.D37S454 2011 


512'.5028553—de22 
2011006420 


With all textbooks, one should attempt to be consistent with notation, not 
only within the text, but also within the field of mathematics on which it is 
based. For the most part, we have done this. 


B, Ki, Q 


R, C 
R”, C” 
paora: cmxn 
S, T, R 
dim(S) 
uU, T êk 


R, v,e 


(1,2,—1), (x1, 22) 
z- 
xy 
proj (W) 
compy (W) 
A, C, X 
AB, AX 


AT Ao! 
(A|B) 
A23, Bjk 
det(A), adj(A) 
p(A) 
T:R-S 
Ker(T), Im(T) 
J? fu) du 


ITs 
j=l 
n 
2 z; 


j=1 

VD(@) 
dF OF 
dx’ Ox 


Table of Symbols and Notation 


Bold capital letters designate sets of objects, 
usually vectors, or a field 
Real and complex numbers, respectively 
n-tuples of real and complex numbers, respectively 
m x n matrices with real and complex entries 
Math script capital letters denote vector spaces 
Dimension of a vector space S 
Lowercase letters are designated as scalars 
Lowercase letters with arrows over them 
are vectors, or column matrices 
Vectors expressed in component form 
Dot product of two vectors 
Cross product of two vectors 
Projection of W onto Y 
Component of W onto ? 
Single capital letters represent matrices 
Matrix multiplication has no symbol, two 
matrices in sequence implies multiplication 
Transpose, inverse of a matrix 
Augmented matrix, with A on the left, B on the right 
Entries of a matrix are indexed by row,column 
Determinant, adjoint of a matrix 
Pseudoinverse of a matrix 
Linear map from vector space R to vector space § 
The kernel, image of a linear map T 


Integral of f(u) with respect to u on [a,b] 


The product 2) 22--+ 2p, 


The sum zı +22 +- + Tn 


Gradient of vector-valued function D 


Derivative, partial derivative, of F with respect to x 


10 


Preface 


This book is an attempt to cross the gap between beginning linear algebra 
and the computational linear algebra that one encounters more frequently 
in applied settings. The underlying theory behind many topics in the field 
of linear algebra is relatively simple to grasp; however, to actually apply 
this knowledge to nontrivial problems becomes computationally intensive. 
To do these computations by hand would be tedious at best, and many 
times simply unrealistic. Furthermore, attempting to solve such problems 
by the old pencil-and-paper method does not give the average reader any 
extra insight into the problem. Mathematica® allows readers to overcome 
these obstacles, giving them the power to perform complex computations 
that would take hours by hand, and can help to visualize many of the 
geometric interpretations of linear algebra topics in two and three 
dimensions in a very intuitive fashion. We hope that this book will 
challenge the reader to become proficient in both theoretical and 
computational aspects of linear algebra. 


Overview of the Text 


Chapter 1 of this book is a brief introduction to Mathematica and will help 
the reader become more comfortable with the program. This chapter 
focuses on the commands and programming most commonly used when 
studying linear algebra and its applications. Mathematica commands will 
always be in bold, with output (if any) displayed in a left-justified fashion 
below each Mathematica command. Readers can enter these commands 
and obtain the same results, assuming that they have entered the 
commands correctly. Note also that all of the images in this book were 
produced with Mathematica. The overall intent of this book is to use 
Mathematica to enhance the concepts of linear algebra, and therefore 
Mathematica is integrated into this book in a very casual manner. Where 
one normally explains how to perform some operation by hand in a 
standard text, we often simply use Mathematica commands to perform the 
same task. Thus, the reader should attempt to become familiar with the 
Mathematica syntax as quickly as possible. 


At the end of each section, you will find two types of problems: 
“Homework problems” and “Mathematica problems”. The former 


11 


consists of strictly pen-and-pencil computation problems, inquiries into 
theory, and questions about concepts discussed in the section. The idea 
behind these problems is to ensure that the reader has an understanding of 
the concepts introduced and can put them to use in problems that can be 
worked out by hand. For example, Mathematica can multiply matrices 
together much faster than any person can and without any algebraic 
mistakes, so why should the reader ever perform these tasks by hand? The 
answer is simple: In order to fully grasp the mechanics of matrix 
multiplication, simple problems must be worked out by hand. This manual 
labor, although usually deemed tedious, is an important tool in learning 
reinforcement. The “Mathematica problems” portion of the homework 
typically involves problems that would take too long, or would be too 
computationally complex, to solve by hand. There are many problems in 
the “Mathematica problems” portion that simply ask you to verify your 
answers to questions from the preceding “Homework problems”, implying 
that you can think of Mathematica as a “solutions manual” for a large 
percentage of this text. You will also notice that several sections are 
missing the “Homework problems” section. These sections correspond to 
special topics that are discussed because they can be explored in detail 
only with Mathematica. 


Website and Supplemental Material 


We suggest that students and instructors alike visit the book’s companion 
Website, which can be found at either of the following addresses: 


http://carmine.se.edu/kfrinkle/ 
PrinciplesOfLinearAlgebraWithMathematica 


http://people.emich.edu/kshiskows/ 
PrinciplesOfLinearAlgebra WithMathematica 


At these locations, you can download Mathematica notebooks, 
corresponding to each section’s Mathematica commands, along with 
many other resources. These files can be used with the book to enable the 
reader to do problems or practice the material without retyping the entire 
Mathematica code. We highly suggest that all readers unfamiliar with 
Mathematica (and even those who are) read over the relevant sections of 
the “Introduction to Mathematica” notebook before they get too far into 


12 


the book in order to understand the Mathematica code more fully. 
Specifically, we suggest looking at plotting/graphing material, differences 
between sets, lists and strings, and expressions versus functions and how 
Mathematica uses each. “Homework problems” solutions and 
“Mathematica problems” notebooks are also available for download. 


Suggested Course Outlines 


It would be nice if we could always cover all of the topics that we wanted 
to in a given course. This rarely happens, but there are obviously core 
topics that should be covered. Furthermore, some of the advanced topics 
require knowledge beyond what students in a basic linear algebra course 
may have. The appropriate prerequisites for this course would be 
trigonometry and a precalculus course in algebra. Also, a computer 
programming course would be helpful because we are using Mathematica. 
A year-long course in calculus would also be beneficial in regard to 
several topics. Here is a list of sections that require advanced knowledge: 

e Section 7.2: Differentiation 

e Section 7.3: Multivariable calculus 

e Section 10.1: Green’s theorem 

e Section 10.3: Divergence theorem and double integrals 

e Section 11.3: Gradients and Lagrange multipliers 

e Section 12.4: Linear differential equations 


We suggest that as much of the book be covered as possible, but here is 
the minimum suggested course outline: 


Chapter 1 Sections 1.1-1.7 2 lectures 
Chapter 2 Sections 2.1-2.3 4 lectures 
Chapter 3 Sections 3.1-3.2 3 lectures 
Chapter 5 Sections 5.1-5.6 8 lectures 
Chapter 6 Sections 6.1-6.4 5 lectures 
Chapter 8 Sections 8.1-8.5 7 lectures 
Chapter 9 Sections 9.1-9.4 5 lectures 
Chapter 11 Sections 11.1-11.3 3 lectures 
Chapter 12 Sections 12.1-12.3, 12.5 3 lectures 


Total 40 lectures 


13 


On inspection of this outline, you will notice that Chapters 4, 7, and 10 
have been completely omitted. Chapter 4 has interesting applications of 
matrix multiplication to geometry, business, finance, and curve fitting, and 
we highly suggest covering Sections 4.1 and 4.4. Curve fitting is covered 
in Section 4.2, but is covered in greater depth in Chapter 11, where 
pseudoinverses and the method of least-squared deviation are introduced. 
Chapter 7 contains applications of the information learned about vectors in 
Chapter 6. If you wish to cover any of the topics in Chapter 10, we highly 
suggest that you cover Section 7.1. Chapter 10 is a fun chapter on linear 
maps and how they affect geometric objects. Affine maps are included in 
this chapter and should be given serious consideration as a topic to cover. 


Final Remarks 


We hope that both students and instructors will find this book to be a 
unique read. Our goal was to tell a story, rather than follow the standard 
textbook formula of definition, theorem, example, and then repeat for 500 
pages. We also hope that you really enjoy using Mathematical both to 
explore the geometric and computational aspects of linear algebra, and to 
verify your pencil-and-paper work. We very much would like to hear your 
comments. Some of the questions we would always like answered, from 
both the student and the instructor, follow: 

1. Were there topics that were difficult to grasp from the 

explanation and examples given? If so, what would you suggest that 

we add or change to help make comprehension easier? 

2. How did you enjoy the mixture of homework and Mathematica 

problems? Did you gain anything from verifying your answers to 

the homework problems with Mathematica? 

3. What were some of your favorite or least favorite sections, and 

why? 

4. Do you feel there were important topics, integral to a first 

semester course in linear algebra, that were missing from this text? 


5. Did embedding Mathematica commands and output within the 
actual explanation of topics help to illustrate the topics? 


6. Overall, what worked the best for you in this text, and what really 
did not work? 


14 


It would be wonderful if this text, in its first edition, were free of errors: 
both grammatical and mathematical. However, no matter how many times 
we read and proofread this text, it is a certainty that something will be 
missed. We hope you contact us with any and all mistakes that you have 
found, along with any comments and suggestions that you may have. 


Acknowledgments 


First, we would like to express our thanks to Jacqueline Palmieri, Kellsee 
Chu, Stephen Quigley, and Susanne Steitz-Filler of John Wiley & Sons, 
Inc. for making the entire process, from the original proposal, to project 
approval, to final submission, incredibly smooth. The four of you were 
supportive, encouraging, very enthusiastic, and quick to respond to any 
questions that arose over the course of this project. We appreciate this 
very much. We would also like to thank the following individuals who 
were involved in the original peer review process: 

Derek Martinez, Central New Mexico University 

Dror Varolin, Stony Brook University 

Gian Mario Besana, DePaul University 

Chris Moretti, Southeastern Oklahoma State University 

Andrew Ross, Eastern Michigan University 


In addition, Chris Moretti spent a significant amount of time patiently 
answering many of our questions and wrote numerous manipulation 
procedures and standalone notebooks for the Website, some of which 
appear in this text. His comments and suggestions really made this book 
more seamlessly integrate with Mathematica. A special thanks also goes 
to Bobbi Page, who took the time to read large portions of early drafts of 
this book, pored over the copious copyedits, and made many invaluable 
suggestions. The four successive spring semester linear algebra students at 
Southeastern Oklahoma State University deserve a warm round of 
applause for being guinea pigs and error hunters. Thanks also goes out to 
the countless students from the many courses that Dr. Shiskowski has 
taught at Eastern Michigan University. We would also like to thank Mark 
Bickham, whose idea for a title to this book finally made both authors 
happy. Thanks again to everyone who was involved in this project, at any 
point, at any time. If we forgot to add your name this time around, perhaps 
you will make it into the second edition. 


15 


Kenneth Shiskowski Karl Frinkle 
Eastern Michigan University Southeastern Oklahoma State University 
kenneth.shiskowski@emich.edu kfrinkle@se.edu 


16 


Chapter 1 


An Introduction to 
Mathematica 


1.1 The Very Basics 


Mathematica is an extremely powerful mathematical software package (or 
computer algebra system) that incorporates text editing, mathematical 
computation, and programming as well as 2D and 3D graphics 
capabilities. You can literally write a complete mathematics textbook 
using only Mathematica where your book includes all of the text and 
graphics in one smoothly flowing document. If you have never or only 
slightly used Mathematica before, then it will take some effort to learn 
how it works—believe me that it is well worth the time expended for the 
ability to do mathematically almost anything you can dream of that a 
computer might be able to do for you. In this introduction to Mathematica, 
you will see only a fraction of its capabilities, but hopefully enough to get 
you well on your way in doing 2D and 3D graphics, solving of equations, 
defining and using functions, lists and matrices, along with some basic 
mathematical programming. 


This chapter discusses the fundamentals of using Mathematica for the 
novice user. If you are already familiar with Mathematica, you may wish 
to skip this chapter, although we warn you that to do so would be at your 
own risk. The new user of Mathematica will find it quite difficult in the 
beginning, but with practice and patience, you will master all of the basics 
and in time come to enjoy using Mathematica. 


17 


Mathematica files are called notebooks, and in a notebook you can place 
text along with input commands and their associated outputs which can be 
literally anything such as graphics, tables or lists, and functions. You can 
group the material in a notebook into different types of cells that are 
indicated on the right side of the notebook by brackets. At the top of the 
notebook you will see the tab Palettes and under it is the Writing 
Assistant, which will allow you to create new cells and/or modify cells. 
You can use Writing Assistant to change the font, color, and size of the 
text in your cells and you can also do this using the Format tab at the top 
of the notebook. The word processing capabilities of Mathematica are 
i similar to those of Microsoft Word with RhE as copy and 


Seen. 
The commands 
respectively, after a horizontal line break between cells. Almost all 
Mathematica cells are Input or Text cells, or Output cells that are created 
when you activate an Input cell. Input and Output cells are normally in 
pairs with Output second directly following its Input. You can also create 
a new cell after a line break by typing in some text where you can control 


the type of cell you are writing in by using the menu which is open at the 
upper left of the screen next to the Save (or disk) icon. 


Each section or chapter of a notebook file in Mathematica should be 
created as a section where the first cell of the section is a Subtitle cell that 
can be created by placing a horizontal bar between or just after a cell and 
then choosing Subtitle from the pulldown menu at the very top left of the 
lower ruler at the top of your screen. In order to create a subsection of this 
section (or chapter), do the same as just described but choose the 
Subsection from this menu. If you have not already used the Window tab 
to insert the Toolbar in your notebook, then please do so now. With the 
Toolbar in place, you can now change the type of cell you are in by using 
the pulldown menu at its far left. The Ruder can also be inserted into your 
notebook if you want it from the Window tab. Note that for those of us 
who like our text in a larger style, Window also has a Magnification 
feature that is quite handy. 


If you wish to delete a cell (use TY E or modify its entire contents in 


some way, then click on the cell tag or bracket on the right and then carry 


18 


out the desired operation using Writing Assistant or the tabs at the top of 
the screen or simply i = for a complete deletion. In text, in order to 


create a new paragraph in a cell, use . To do the same in an /nput 


cell where commands are placed, use as well. If you wish to 


split a cell, then use JESF with the cursor at the location of the 


split. In using Writing Assistant or any pulldown tab, if you click on the 
triangles on the left you will open or close one of the sections inside this 
tab. Note that text paragraphs are not necessarily indented automatically, 
so you must indent them yourself manually if you want this to happen. 


If you wish to close a group of cells and see only the first cell of the group 
(which should be the title cell of the group), then double-click on the 
far-right bracket for the group. You will then see a cell bracket with an 
arrow to the right of the cell bracket of the title or first cell of the group. If 
you double-click on this arrow, then you can open all the cells of this 
group. The copy eset Fy eI and paste {4 F} features of Mathematica 


are the same as those of Microsoft Word and other software. If you wish to 
change the size, font, or other feature of a collection of cells, select one of 
them by clicking on its bracket and then hold the E% key down while 
you select the rest of the cells—now go to the Format tab or other location 
and carry out your change. 


It is strongly recommended that you save (use $ 


constantly since, like all software, Mathematica can glitch, which could 
cause you to loose some or all of your material. You should have backup 
copies of all of your work on a separate computer or flash drive since from 
our own personal experience, we know that unfortunate problems can 
occur. 


If you are using Mathematica to do homework problems, it is strongly 
suggested that you place each problem in a single group of cells with the 
first cell as the title of the problem. This will make it much easier to 
organize your work both for yourself and the instructor who may read 
your material. After you have finished working in a particular 
Mathematica notebook, it is also recommended that you delete all of your 
output from the file unless it will take too long to recompute it. Most, if 
not all, of the size of a Mathematica file will be due to graphics, especially 
3D graphics, and such files can become very large and consequently take 


19 


Mathematica quite a while to open or save, and at such times an error can 
occur. Under the tab Cell, you have the command Delete All Output, 
which removes all output from the entire file—you might use this 
periodically while working in a notebook in order to shorten the file. 
When you reopen a notebook where all output has been deleted, you can 
reconstruct it all by going to the tab Evaluation and using the command 
Evaluate Cells; the Input cells will then be evaluated from the first one of 
the notebook to the last one. 


If a Mathematica calculation is taking too long and/or you notice that 
there is an error in the input, then, in order to terminate the calculation, 
you should go to the Evaluation tab at the top of the screen and select 
Abort Evaluation. This should immediately halt the calculation in its 
tracks unless Mathematica is stuck in some enormous loop and cannot 
find its way out—then your only alternative might be to use 
FIP LTA AS and/or turn your computer off, that is, gently pull the 


plug on the machine, while apologizing to it. 


Beware of using capital letters to define a quantity in Mathematica as it 
might already be a built-in command name that you cannot override with 
something else. You should also avoid using the capital letters C, D and N 
for any kind of variable or name in Mathematica as they are also 
command names. The commands Clear and Clear All will undefine a 
quantity that you have named. If you use the command Exit[], it should 
clear everything from memory that you have defined and Mathematica has 
produced as output by quitting the Mathematica kernel, which is the core 
of Mathematica. 


In order to define or name a quantity in Mathematica, you must first 
decide on an appropriate name that cannot be a previously used name or 
Mathematica command name, nor should it be a common variable name 
like x, y, and z, which you might use in equations or functions/expressions 
as a variable symbol. You can never use the same symbol or name in 
Mathematica for more than one thing. Once you decide on a name such as 
TrialName, then in an Input cell say TrialName = (or :=) where, after the 
equal sign, you must give the expression that is the definition of 
TrialName. In Mathematica, an equal sign = is used for definitions, while 
a double equal sign = = is used in equations. You can use != for not 
equals. The := is often used for defining functions since it suppresses 
output and evaluation of the named quantity. 


20 


If you are using a Mathematica command, but have forgotten how to use 
it, then place the cursor in the middle of the command name and hit the 
EL] key to have Mathematica bring up the Help file for this command 
name. You can also go directly to Help and type in the command name 
yourself, especially if you have forgotten its correct spelling. Don’t forget 
that every command name in Mathematica has its first letter capitalized. 


If you place a semicolon (;) at the end of a named input, then Mathematica 
will not give any associated output even though it internally carried out 
your command and stored it to the name given it. This feature can be 
useful when the output would be very long and you do not need to see it 
all displayed, only have it computed and/or stored. 


Mathematica can use standard mathematical notation for powers K and 
a ‘ P ; 

fractions 6 where after the base k is typed will give a power 

location and after a numerator is typed will create a fraction 


and placement for the denominator—both keys must be used 
simultaneously. If you use these keys after a space, then Mathematica 


o 
creates blank shells Ou and © for the appropriate quantities to be 
inserted. 


Finally, if you are a novice or beginner at using Mathematica, then besides 
this introductory material there are many videos on YouTube that explain 
most of the basic features of Mathematica. It is strongly suggested that 
you seek these out and hopefully will find a few useful ones for doing 
what you are interested in. Mathematica itself has tutorials that you should 
consider using if you find them useful. 


1.2 Basic Arithmetic 


In this section, we will start to use Mathematica to do some basic 
arithmetic and algebra computations. In the arithmetic, which is done first, 
we will add and multiply, factor positive integers into products of powers 
of primes, find the greatest common divisor (GCD) and the least common 
multiple (LCM) of two positive integers, and more. In the algebra, we will 
factor polynomials, divide one polynomial into another to get their 


21 


quotient and remainder, solve for the roots of a polynomial and also solve 
equations for their unknowns, and perform other algebraic operations. 


In order to create an /nput cell where you can do your calculations, go to 
the tab Palettes and bring up Classroom Assistant. Now click on the tab 
Create Input Cell with the cursor at the end of our prior work. Now you 
will have a new Input cell as part of your current group of cells. Both 
palettes, Classroom Assistant and Basic Math Assistant, have symbols 
such as 7 in them as well as the natural number e. 


In the first input cell below, you will find the command 1 + 1. If you hit 
ja meen) with the cursor on this line, then Mathematica will carry 


out your command and produce 2 as an output. After you get the output, 
Mathematica automatically assumes that you want another Input cell, and 
so typing right after an output will be in a new Jnput cell. You can also go 
back and insert a new Jnput cell by creating a horizontal bar between two 
Input cells by clicking on the region between the two cells, and then using 

Wi) creates a text cell. You can insert a Text cell 


between Jnput cells by creating the horizontal line divider between cells 
by clicking on the space between the cells and then using Text Cell out of 
the tab Text Cells in Writing Assistant In addition, if you type in a 
Mathematica command name such as FactorInteger, but now you have 
forgotten precisely how it works and need its help file, then put the cursor 
in the command name and hit EL! 


Note that Mathematica will also recognize a space in a product as 
multiplication, although for safety sake you might want to put in all of 
your multiplications directly. Mathematica gives exact answers in a 
calculation if the inputs are all also exact values, but any value with a 
decimal point in it is treated by Mathematica as an approximation and it 
gives an approximate answer back. The command N[V, m] will give the 
approximate value of the quantity V to m digits of accuracy. Mathematica 
can answer most computational questions to an arbitrary number of digits 
of precision—look up the command WorkingPrecision to see how it can 
be done as an alternative to the use of the command N. In the last example 
of this arithmetic section we also multiply three complex numbers using 
Product—a complex number in Mathematica is expressed as a + bi for a 
and 6 real numbers. 


22 


In order to get a power of something in Mathematica that is placing a 
superscript, use AAH] together to get an exponent location after you 
have already typed in the base. If you want a fraction in the same standard 


way, then type the numerator followed by peg and then the 


denominator in the location created: 


= 5x — 2y + 7z == 15; 
ContourPlot3D[Evaluate[f], {x, —7, 7}, {y, —7, 7}, {z, —7, 10}, 
Mesh-—None, ContourStyle—+Red, AspectRatio—1/2] 


Sum[|k?, {k, 1, 10} 

385 

Sum|7*, {k, 1, 3}] 

rr +n 

N[Sum[|7*, {k, 1, 3}], 10] 
44.01747373 

Sum (3.14159265*, {k, 1, 3}] 


44.0175 


Product[Ż, {k, 1, 5} 


120 


FactorInteger[90} 


{{2,1}, {3,2}, {5, 1}} 


23 


21375? 

90 

GCD{(210, 90] 

30 

LCM[210, 90] 

630 

Quotient Remainder[83, 5) 
{16,3} 

5x 1643 


83 


Now we switch from arithmetic to algebra. Our algebra will be mainly 
polynomial and similar to what we did in the arithmetic part above, 
although we will find the roots of polynomials as well. After the first two 
computations of multiplying out two polynomials and then dividing the 
one of larger degree by the smaller-degree one, we will name or define the 
two polynomials as Poly] and Poly2, and then repeat the process to see 
that Mathematica understands what we want. Also, we define the list 
called QR below which is the quotient first and remainder second in our 
division of Poly2 by Polyi—a list is an ordered collection of objects that 
Mathematica places {} around its elements, with commas between the 
elements. Then OR[[I]] is the quotient and OR[[2]] is the remainder in the 
division. We will discuss lists and matrices in the next chapter. 


Mathematica expresses its equations with a double equal sign = = while it 
uses a single equal sign to make a definition or assignment of a quantity to 
a name such as in the use of Poly] and Poly2 below. Hence, 5x + 3y == 

is an equation in Mathematica, while Eqn1 =5x + 3y == 9 assigns the 
name Egqn/ to this equation for the time you are using this notebook unless 


24 


you decide to change it. In Mathematica, you do not need to insert a 
multiplication sign, as Mathematica usually places a space between 
objects, which indicates multiplication. 


One last bit of useful information is that Mathematica uses the percent 
sign % to refer to the last computed output and %% to refer to the next to 
last computed output. This can be helpful, as we will see below. We will 
use the % below when we want to change the roots of Poly/ to a set of 
tules that can then be substituted back into Poly/ to see that we get 0 (or 
very close to 0) back. As well, it is probably better to assign a name to 
your quantities in order to be better able to use them later and know what 
you are specifically talking about, and we do this for the roots of Poly2: 


Expand{[(7 x? + 5x? —9x+1)(—4x® —x5 + 3x4 — 2x? +x? — 8x + 5)] 


5—53 x+98 x?—16 x°—30 x*—31 x9 +6 x®+52 x" —27 x®—28 x? 


PolynomialQuotientRemainder[—4 x® — x5 + 3x4 — 2 x? +x? -8x +5, 
7x? + 5x? — 92 +1, x] 


2401 343 49 7° 2401 2401 2401 


{a0 170x 13x? 4x? 10826 7407x E 


Polyl = 7x? + 5x? — 9x +1 

1-9x+5x?+7x? 

Poly2 = —4x® — x5 + 3x4 — 2x3 + x? — 8x +5 

5—8 x+ x’ —2x?+3 xt—x>—4 x6 

Expand[Poly1 Poly2] 

5—53 x+98 x?—16 x3 —30.x*—31 x°+6 x°+52 x’ —27 x8 —28 x? 
QR = PolynomialQuotientRemainder[Poly2, Poly1, x] 
En _ 170x , 13x? 4x? 10826 7407x ae} 


a Paci 
2401 343 49 7° 2401 2401 2401 


25 


Expand[QR[[1]] Poly1 + QR{[2]]] 


5-8 x+x*—2x343 xt—x5_4 x6 
NRoots{Poly1 == 0, x] 
x == —1,58331 || x == 0.120547 || x == 0.74848 


{ToRules{%] } 


{ {x + —1.58331}, {x + 0.120547}, {x > 0.74848} } 


Poly1 /. % 


fo., 2.08167x 10-2, 0.} 


Poly2Roots = N[Roots{[Poly2 == 0, x], 10] 


x == —1.532415683 || x == 0.6278496205 || 
== —0.4656906912 — 0.9837390675 i|| 
x == —0.4656906912 + 0.9837390675 i || 
x == 0.7929737223 — 0.6840534163 i || 
x == 0.7929737223 + 0.6840534163 i 


26 


Rules = {ToRules[Poly2Roots] } 


{ {x + -1.532415683}, {x -+ 0.6278496205}, 
{x = -0.4656906912 — 0.9837390675 i}, 
{x > -0.4656906912 + 0.9837390675 i}, 
{x + 0.7929737223 — 0.6840534163:i}, 


{x —> 0.7929737223 + 0.6840534163 i} \ 


Poly2 /. Rules 
fo. x 1078, 0. x 107°, 0. x 1078 +0. x 107i, 
0. x 1078 +0. x 107,0. x 1078 +0. x 1078 i,0. x 1078 +0. x 108i} 
Rules((2)] {{1}}[[2]] 
0.6278496205 
Poly2Roots|[2]} {(2]] 
0.6278496205 
Poly = Expand[—4Product[x— Poly2Roots|[k]]|[2]], {k, 1, 6}]] 


(5.00000000 + 0. x 107° i) — (8.0000000 + 0. x 107° i) x+ 
(1.0000000 + 0. x 107° i) x? — (2.0000000 + 0. x 1078 i) x3+ 
(3.00000000 + 0. x 107° i) x* — (1.00000000 + 0. x 107° i) xë — 4x® 


27 


Sum/|Re[Coefficient (Poly, x, k]] x*, {k, 0, 6} 


5.00000000 — 8.0000000 x + 1.0000000 x? — 
2.0000000 x? + 3.00000000 x* — 1.00000000 xë — 4x® 


Chop[Poly, 1077] 


5.00000000 — 8.0000000 x + 1.0000000 x?— 
2.0000000 x? + 3.00000000 x* — 1.00000000 xê — 4 xê 


Product{k — (k + 2) I, {k, 0, 3} 


40+ 1601 


1.3 Lists and Matrices 


Now we look more closely at the different ways in which we can organize 
information into lists and matrices (a list of lists), and how these different 
structures work and can be manipulated. In Mathematica, there are no 
sets, since Mathematica requires that an ordering be placed on its data, 
and so it deals with lists that can be treated as sets if you ignore the order 
of the elements in the list. The list L of the elements a, b, and c in this 
order is given as L = {a, b, c} in Mathematica. Shortly, we will look at 
taking the union, intersection, complement, and concatenation (or joining) 
of lists as well as taking out parts of a list. The empty list is { }, and L[[k]] 
is the Ath element of list L. 


A string S is a grouping or ordered collection of characters or symbols 
with quotes around them such as S = “Mary had a little lamb”. The 
string assigned to the name S is the sentence between the two double 
quotes. Strings often show up as the title to Mathematica plots and similar 
structures. 


28 


A sequence or table in Mathematica is a list of objects created from a 
formula such as {1,4,9} can be created as the output from the input Table 
[k?, {k,i,3}]. 

L1 = {a, b, 1, 3, c, 5} 

{a, b, 1, 3, c, 5} 

L2 = {7, 2, 5, 2, 5, 1, c, b} 

47, 2; 5, 2,5, 1; ¢; b} 

Union[L1, L2] 

{1, 2, 3, 5, 7, a, b, c} 

Intersection{L1, L2] 

{1, 5, b; c} 

Complement/L1, L2] 

{3, a} 

Join(L1, L2] 

{a, b, 1, 3, c, 5, 7, 2, 5, 2, 5, 1, c, b} 

Length[L1] 

6 

L1[[2]] 

b 


29 


Squares = Table[k?, {k, 1, 10} 


{1, 4, 9, 16, 25, 36, 49, 64, 81, 100} 


Sum|Squares|[k]], {k, 1, 10} 
385 


Now we switch to matrices or two dimensional arrays of rows and 
columns of entries. Mathematica considers a matrix M to be a list of lists 
where each element of M is one of the rows of the matrix M, and the rows 
must all have the same length. As such, M = {{1,2,3}, {4,5,6}, {7,8,9}} has 
{1,2,3} as its first row, {4,5,6} as its second row, and finally {7,8,9} as its 
third row. We will look at examples of adding, multiplying, inverting, 
transposing, and finding determinants of matrices where multiplication of 
matrices is indicated by a period or dot (.) between their names as given in 
the fifth line of input below. 


In order to define a matrix M and have it displayed in the proper matrix 
format as rows and columns, use round parentheses around your definition 
of M followed by // MatrixForm. Then if you wish to manipulate M with 
other matrices so defined, there will be no problems, but remember to 
always use // MatrixForm after your computations or definitions to get 
the proper matrix format: 


M = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}} 
{{1, 2, 3}, {4,5,6}, {7,8,9}} 


MatrixForm(M] 
1 2 3 
4 5 6 
789 


30 


K = {{-5, 7, 1}, {0, —4, 6}, {2, —1, 9}} 
{{—5,7, 1}, {0, —4,6}, {2,—1,9}} 


MatrixForm|[K] 


—5 7 1 
0 —4 6 
2 -1 9 


(L = M + K) // MatrixForm 


-4 9 4 
4 1 12 
9 7 18 


MatrixForm|[M.K] 


1 —4 40 
-8 2 88 
-17 8 136 


Det [M] 
0 

Det [K] 
242 
Det{M.K] 
0 


31 


Inverse[M] 
Inverse::sing : Matrix {{1,2.3}, {4,5,6}. {7.8.9}} is singular. >> 
Inverse[{{1, 2, 3},{4, 5, 6}, {7, 8, 9}}] 


Inverse[K] // MatrixForm 


_1 2 2 

121 121 121 
$ £6 Jt 
121 42 121 
go io A 
121 242 121 


Transpose[M] // MatrixForm 


1 7 
2 8 
3 9 


o> ST 


1.4 Expressions versus 
Functions 


This section will look at the differences in Mathematica between 
expressions and functions and how to manipulate and evaluate both. You 
should think of an expression as the rule for some function f; that is, if Ax) 
= 5x + 9, then 5x + 9 is an expression which is the rule of the function f: 
Mathematica treats a function very differently than it treats an 
expression—you should think of an expression as a string of symbols 
while a function is a string of symbols (its rule or expression) with a 
method of evaluating its expression at different values of the variable(s) in 
the expression. 


Let’s now look at the difference in Mathematica between the expression g 
= 5x + 9 and the function f with rule f(x) = 5x + 9, and how each of them 
can be evaluated at x = 1. Note that normal function evaluation can be 
done so that f evaluated at x = 1 is f[1] while the expression g can be 
evaluated at x = 1 by the substitution command g /. x—1. If you want to 


32 


compose the function f with the built-in function Sin[x], then f[Sin[x]] 
will do it—this composition can also be done using f@Sin[x] or Nest[f, 
Sin[x], 1], and its result is an expression, not a function. If you want f to 
be composed with itself k times, then use Nest[f, f[x], k]. 


g=—5x+9 
9+5 x 

g/. x1 

14 

ffx] =5x +9 
9+5x 

f[1] 

14 

g /. x-+Sin[x] 
9+5 sin|x] 
f[Sin[x]] 

9+5 sin[x] 

h = f@f[x] 
9+5 (9+5x) 
Simplify [h] 
54+25x 
Nest[f, f[x], 1] 
9+5 (9+5 x) 


33 


Simplify[%] 


54425 x 


Nest[f, f[x], 1] /. x—2 
104 


Nest/f, f[x], 5] 
9+5 (9+5 (9+5 (9+5 (9+5 (9+5x))))) 


Simplify [%] 
35154+15625 x 


As the last topic of this section, let’s do an example of a piecewise 
function and its graph (see Fig. 1.1). A piecewise function is one whose 
rule is given in parts or pieces where each part is used only when certain 
conditions are satisfied. Happily, Mathematica has a Piecewise command 
that we can utilize. Note that in defining the function f(x) below that && 
is the Mathematica notation for the logical AND in joining two statements: 


f[z-] = Piecewise[{{Sin{x], x < —2 }, {Cos[x], x > —2&&x < 3}, 
{—x +5, x > 3}}] 


Sin{x] x <-2 

Cos[x} x >-2&&x <3 
5-x x>3 

0 True 


Plot[f[x], {x, —7, 27}, PlotStyle—+{Red, Thick}] 


Figure 1.1: Plot of the piecewise function f. 


34 


f(x) 


4 | 


1.5 Plotting and Animations 


We begin our investigation into plotting with the basic 2D plotting of the 
graphs of functions y = f(x) and expressions as well as plotting parametric 
curves x = AÀ, y = g(t). We will plot single functions and parametric 
curves as well as several together and in combination in different colors. 
Besides these two types of curves, we will implicitly plot equations in the 
two variables x and y such as the unit circle with equation x^ + y^ = 1 or 
something more complicated. The implicit plotting of equations can be 
done using the ContourPlot command and Contour Style option to 
control color, thickness, etc. of the plot. 


Let’s begin with simple function or expression plotting, and then move on 
to parametric curve plotting. In the first example, remember to get base e 
for the exponential function from the palette Basic Math Assistant. Also, 
in order to get a superscript or exponent, use BESAS] together after the 
base is already in place. In order to create a fraction, first type the 


numerator and then hit (E to be able to place the denominator. 


35 


The plot option of Plot Style can control color and thickness for graphics 
and control whether the graph lines are solid or dashed except when doing 
implicit plotting of equations using ContourPlot when PlotStyle switches 
to ContourStyle (see Figs. 1.2—1.7). 


f[z-] = x Sin{x? e*] 
x Sin [e* x°] 
Plot[f[x], {x, —1, 2}, PlotStyle+Red, PlotRange—{—2, 2}] 


Figure 1.2: Output of the Plot command with various options. 


f(x) 


9 


nN 


g= e-3 Sin[3x] 


ari Sin{3x] 


Plot[{f[x];g}, {x,—1,2}, PlotStyle— {Directive[Red,Thickness[.005]], 
Directive[Blue, Thickness[.01]] }] 


Figure 1.3: Plot of fand g together with separate options for each curve. 


36 


F(x), g(x) 


NY 


ParametricPlot[{Sin[2 t], Sin[3 t]}, {t, 0, 27}, PlotStyle— Directive 
{Blue, Thick]] 


Figure 1.4: Plot of the parametric equation (sin(2f), sin(3f)). 


ParametricPlot[{{Sin[2 t], Sin[3 t]}, {Sin[t], Cos[t]}},{t, 0, 272}, Plot- 
Style {Directive[Blue, Thick], Directive[Red, Thick] }] 


Figure 1.5: Two parametric curves plotted together. 


37 


y(t) 


A | 


ContourPlot[{x? + y? == 1, x‘ + yt == 1}, {x, —1, 1}, {y, —1, 1}, 
ContourStyle—{Directive[Red, Thick], Directive[Blue, Thick] }] 


Figure 1.6: Example of the ContourPlot command with two implicitly 
defined relations. 


Le - —_ | 


0.5 | 


-0.5 - 


-lh : 


=] -0.5 0 0.5 


a 


— 


38 


— 5)? = 2 — 5)? — 10)? 
ContourPlot| {= 5)" + G= Ny == 1, (x ) e (y ) 
49 144 49 144 


1}, {x,—10,25}, {y,—10,25}, ContourStyle— {Directive[Red,Thick], 
Directive[Blue, Thick] }| 


Figure 1.7: Second example of the ContourPlot command with two 
implicitly defined relations. 


20 


-10t 1 da 
-10 5 20 


x 


Now we turn our attention to creating a movie or animation in the 
xy-plane whose frames consist of plots of the three different types. Let’s 
begin by plotting the function y = sin(x) from x = 0 to x = A where the 
animation parameter A goes from 0 to 4 (see Figs. 1.8 and 1.9). It is 
followed by running two y = sin(x) animations at once based on the same 
animation parameter A: 


Animate[Plot[Sin{x], {x, 0, A}, PlotRange-+{{0,47}, {—1.01,1.01}}, 
PlotStyle —Directive[Blue, Thick]], {A, 0.01, 47}, AnimationRun- 
ning— False] 


Figure 1.8: Animation of the plot of sin(x) for x € [0, A], here A = 2.67. 


39 


Animate[{Plot[Sin[x], {x, 0, A}, PlotRange +{{0, 47}, {—1, 1}}, 
PlotStyle— Directive([Blue, Thick]], Plot[Sin[x + A], {x, 0, 4}, Plot- 
Style—Directive[Red, Thick]]}, {A, 0.01, 47}, AnimationRunning— 
False] 


Figure 1.9: Animation of sin(x) and sin(x + A), x € [0, A]. Here A = 2.62. 


We next do an animation (see Fig. 1.10) involving implicit plotting of an 
ellipse where the center is moving along the circle with center at the origin 
and radius 10. Here we make use of the Epilog option to put into each 
frame of our movie the circle of the ellipses’ centers: 


40 


(x — 10 Sin[A])? | (y — 10 Cos[A])? __ T 

4. F 25. ks 
Animate[ContourPlot[| Evaluate[ellipses /. A—B], {x, —15, 15}, {y, 
—15, 15}, ContourStyle—Directive[Blue, Thick], PlotPoints—+100, 
Epilog— {Red, Thick, Circle[{0, 0}, 10]}], {B, 0, 27}, AnimationRate 
—.15, AnimationRunning-— False] 


ellipses = 


Figure 1.10: Animation of the rotation of the ellipse, here B = $x. 


It is time to move function, parametric, and implicit plotting from the 
xy-plane to xyz-space. We begin by plotting functions z = f(x, y), which 
give surfaces in xyz-space (see Fig. 1.11). This is followed by parametric 
curve and surface plotting: 


41 


f[z-, y-] = Sin[x + y] Cos[x — y] + 3; 
g = Sinx y] + e~ +”); 


Plot3D[{f[x, y], g}, {x, —3, 3}, {y, —2, 2}, PlotStyle+{Red, Blue}, 
AxesLabel— Automatic] 


Figure 1.11: Plot of two surfaces. 


Now we examine parametric curve plotting in space. A parametric curve, 
or spacecurve (see Figs. 1.12 and 1.13), is of the form x = f(d), y = g(4), z 
h(t) with one independent variable ¢. For spacecurves, we use the 
ParametricPlot3D command which is nearly identical to the 
ParamatricPlot command that was previously introduced. 


ParametricPlot3D[{Cos[2 t], Sin[4 t], Cos[6 t]}, {t, 0, 27}, PlotStyle 
—Directive[Blue,Thick], PlotPoints+250, AxesLabel— Automatic] 


Figure 1.12: Plot of a spacecurve. 


42 


ParametricPlot3D [{exts Cos|t], e1% Sin[t], 5b {t, 0, 487r}, Plot- 
Style—Directive[Blue, Thick], PlotPoints—+250]| 


Figure 1.13: Plot of a helical spacecurve. 


43 


A parametric surface is given by x = flu,v), y = g(u,v), z = h(u,v) with two 

independent variables u and v. We first plot a torus (Fig. 1.14) followed by 

three interlocking mutually perpendicular tori (Fig. 1.15). 
ParametricPlot3D[{(7 + 3 Sin{u]) Sin[v], (7 + 3 Sin[{u]) Cos[v], 2 
Cos[u]}, {u, 0, 27}, {v, 0, 27}, PlotStyle— Blue] 


Figure 1.14: Plot of the torus, a parametric surface. 


44 


10 


N 


-10 


ParametricPlot3D[{{(7 + 3 Sin[u]) Sin[v], (7 + 3 Sin{u]) Cos[v], 2 
Cos{u]}, {2 Cos[u], (7 + 3 Sin[{u]) Sin[v], (7 + 3 Sin[u]) Cos[v]}, {(7 
+ 3 Sin[u]) Sin[v], 2 Cos[u], (7 + 3 Sin[u]) Cos[v]}}, {u, 0, 2}, {v, 
0, 27}, PlotStyle+{Blue, Red, Green}] 


Figure 1.15: Three intersecting tori. 


10 


10 


Next, we do implicit plotting of equations in the three variables x, y, and z, 
whose resulting plot gives a surface. We will start with a cylindrical 
surface, which is a surface whose equation has only two of the three space 


45 


variables in it. This equation is really a curve in the plane, with certain 
radially symmetric properties, of its two variables, while adding the third 
variable (direction) results in a surface (see Figs. 1.16—1.18). 


ContourPlot3D[x? + z? == 9, {x,—3,3}, {y,—3,3}, {z,—5,5}, Con- 
tourStyle-+ Blue, BoxRatios— Automatic, AxesLabel— Automatic] 


Figure 1.16: A cylinder defined implicitly. 


ContourPlot3D[{x? + y? +z? == 16, x? + y? == 9}, {x, —5, 5}, {y, 
—5, 5}, {z, —7, 7}, ContourStyle-+{Blue, Red}, BoxRatios— Auto- 
matic, AxesLabel— Automatic, Mesh-+None| 


Figure 1.17: A cylinder intersecting a sphere. 


46 


F 


4 
“ 


Jn 


Animate[ParametricPlot3D[{(C + A Sin[u]) Sin[v], (C + A Sin[u]) 
Cos[v], B Cos[u]}, {u, 0, 27}, {v, 0, 27}, PlotStyle+Blue, Axes- 
Label —-{x, y, z}, PlotRange-+{{—11, 11}, {—11, 11}, {—5, 5}}, 
PlotPoints—100, Mesh-+None], {A, 1, 3}, {B, 1, 4}, {C, 5, 8}, 
AnimationRunning— False, ControlType-—Slider, ControlPlacement 
Right] 


Figure 1.18: Animation of parameters in the graph of a torus. The 
parameter values for this frame are A = 1.5, B= 2, and C = 6.5. 


47 


A word of warning in doing 3D animations. This type of animation can 
use a great deal of memory, so we highly recommend that you delete the 
output from memory before closing the file unless it took a great deal of 
time to produce the animation and you do not want to have to do it again. 
We animated the torus with three animation parameters. Play around with 
each slider, corresponding to the three parameters, to see how they affect 
the shape of the torus. 


1.6 Solving Systems of 
Equations 


Mathematica can solve single equations as well as simultaneous systems 
of equations, both linear and nonlinear. It can also find approximate or 
exact solutions, although for exact solutions the equation or system must 
be capable of being solved for exact solutions by some known method. 
Also, Mathematica can find both real and complex solutions. The solution 
given by the command Solve will give a list of replacements for the 
variables solved for in your solution. In order to get the actual values of 
the solutions as a list and only for a specific variable solved for like z, you 
must use the input z /. soln, where soln is the result of Solve and z is one 
of the variables solved for in Solve. The command Solve seeks exact 
solutions while NSolve always gives approximate solutions. 


We begin by solving some single polynomial equations for real and 
complex solutions, both approximate and exact. Solve will give all real 
and complex solutions to a single polynomial equation, that is, it will find 
all of the real and complex roots of any complex coefficient polynomial: 


PolyN = 35 x3 + 4; 
soln = Solve[PolyN == 0, x] 


1/3 afi g2/i 
(a) e fx > -Q}, fx 9 Fah} 


48 


approxsolns = N{soln, 25] 


{{x — 0.2426427503202586581123865 + 0.4202695716429374683285407 i}, 
{x —+ 0.2426427503202586581123865 — 0.4202695716429374683285407 i}, 
{x + —0.4852855006405173162247729} } 


PolyN /. % 
{0.x10-4+0. x10", 0.x10-4+0.x 107" i, 0.x10- } 


xsolns = x /. approxsolns 

{0.2426427503202586581 123865 + 0.4202695716429374683285407 i, 
0.2426427503202586581123865 — 0.4202695716429374683285407 i, 
— 0.4852855006405 173162247729} 

Q = 5x*® —7x?+Ix-—4I 

—4i+ix—7x?+5x3 


soln = Solve[Q == 0, x] 


{69 $+ ee ot 
15 1 1/3 
(4 ((686 + 23851) + 15\/—24693 + 16404) ) 
1/3 
>G ((686 + 2385i) + 15/=24605 + T640) ) }, 
f> 7 2 — i) (1 -iv3) 


15 22/3 ((686 + 2385i) + 15/—24693 + 16404i) /° 
1 1 1/3 
— _ i — a i 
a ( z ((686 + 2385i) + 15V—24693 + 164043) (a + iv3) }, 


f> Kua 5 G ((686 + 23851) + 15/200 FT60) ) (1 -iv3) - 


15 
22/3 ((686 + 2385i) + 15/—24693 + 16404i) '”° 


49 


approxsolns = N{soln, 15} 


{{x — 1.48095933119864 + 0.21092675658740 i}, 
{x — 0.430101246109932 — 0.658941196856415 i}, 
{x — —0.511060577308570 + 0.448014440269016 i} } 


Q /. approxsolns 


{0.x107"9+0.x10-14i, 0.x107!4+40.x107"4 i, 0.x107"4+0.x10- 14a} 


xsolns = x /. approxsolns 


{ 1.48095933119864 + 0.21092675658740 i, 
0.430101246109932 — 0.658941 196856415 i, 
— 0.511060577308570 + 0.448014440269016 i} 


Chop[Expand[5 (x — xsolns[[1]]) (x — xsolns[[2]]) (x — xsolns[[3]])]] 
—4,0000000000000 i+1.0000000000000 i x—7.0000000000000 x? +-5 x? 

Now we look at some examples of solving systems of equations that are 
both linear and nonlinear. We begin by finding the intersection point of 


two lines in the xy-plane (Fig. 1.19) followed by finding the intersection 
points of two circles (Fig. 1.20): 


Eqnl = 5x+3y == —4; Eqn2 = —7 x + 2y == 6; 
Soln = Solve[{Eqn1, Eqn2}, {x, y}] 


(=-=; walt 


{Eqn1, Eqn2} /. Soln 
{{True, True}} 


50 


approxsolns = N[Soln, 10] 


{ {x + —0.8387096774, y > 0.06451612903 } } 


{xsolns, ysolns} = {x, y} /. Flatten[approxsolns] 


{ -0.8387096774, 0.06451612903} 


ContourPlot p x+3y == —4, —7 x+ 2y == 6, (x— xsolns)? +(y— 
ysolns)? == ah {x, —5, 5}, {y, —5, 5}, ContourStyle— {Directive[ 
Red, Thick], Directive[Blue, Thick], Directive[Black, Thick] 3] 


Figure 1.19: Solution to the system of two lines is a point. 


4- 


-4) 


{Eqn1, Eqn2} = {(x—7)?+(y—2)? == 25, (x—1)?+(y—5)? == 16}; 
Soln = Solve[{Eqn1, Eqn2}, {x, y}] 


{{x3 5 (17- avi), y = = (19-4vil) }, 
{x E = = (19+4vit) }} 


51 


approxsolns = N[Soln] 


{{x — 2.07335, y > 1.1467}, {x — 4.72665, y > 6.4533} } 


solns = {x, y} /. approxsolns 


{ {2.07335, 1.1467}, { 4.72665, 6.4533} } 


solns = {x, y} /. approxsolns 
{ {2.07335, 1.1467}, { 4.72665, 6.4533} } 


{IntersPt1, IntersPt2} = {solns[[1]], solns[[2]]}; 
ContourPlot [{(@«-7)?+(—-2)? == 25, (x—1)?+(y—5)? == 16, (x— 
IntersPt1[[1]])? + (y — IntersPt1[[2]])? == Š (x — IntersPt2[[1]])? + 


1 
(y — IntersPt2[[2]])? == a} {x,—5,15}, {y,—5,15}, ContourStyle—{ 
Directive{[Red,Thick], Directive[Blue,Thick], Directive[Black,Thick], 
Directive{Black, Thick]}| 


Figure 1.20: Intersection of two circles is a pair of points. 


52 


>, 5 $ kS 
\ 
a A 
ee es 
{ee 4 
-5 5 15 
x 


Eqns = {(x — 7)? + (y — 2)? == 25, (x + 2)? + (y — 5)? == 16}; 
Soln = Solve[Eqns, {x, y}] 


{{x- 5 (41 -iva9),y > 5 (73 - 3iv89) 1, 
{x = (41+ iv89),y > 5 (73 + 3ivao) }} 
N[Soln] 


{{x — 2.05 — 0.471699i, y > 3.65 — 1.4151 i}, 
{x = 2.05 + 0.471699i, y > 3.65 + 1.4151 i}} 


1.7 Basic Programming 


In this final section of Chapter 1, we will discuss some basic mathematical 
programming concepts such as the For loop command, the use of 


53 


commands such as While and Do, and the general concept of a procedure 
in Mathematica that consists a series of commands with semicolons 
following each step of the procedure. A procedure’s steps will be executed 
in order from left to right if they appear all on the same line or from first 
to last if they are on successive lines in the same cell. In general, by 
design, the output of a procedure is the result of its final step. 


Let’s begin by doing a few examples involving For loops such as printing 
values, doing sums and products, and using the commands Do and If. The 
first For loop will print tz starting with z = x” as k goes from 1 to 4, 
where at each step it takes the previous value of z and plugs it into z” + z 
for the next value of k. The semicolons in the For loop separate the main 
parts of the procedure inside the For command, which are first the 
initialization of the loop variable k, the instructions to be carried out 
successively as x increases in implementing the loop, and finally the 
output that you desire at each step (value of k) of the loop. 


For |k = 1; z = x?, k < 4, k +4, z = z“ +2; Print(z]| 
2x? 

2x? + 4x4 

2x? + 4x4 + (2x? + 4x4)? 


f ; ; j ae 
2x? +4x* + (2x? + 4x4)? 4 (2x? +4x* + (2x? + 4x*)’) 


The second For loop will compute and display the sums of the reciprocals 
of the factorial function. The results get closer and closer to the number e 
which is approximately 2.71828. 


54 


For [x = 1, k < 10, k ++, 


1 
Print [Nsum =. {n, 0, k}, WorkingPrecision+10|] | 
n 


2.000000000 
2.500000000 
2.666666667 
2.708333333 
2.716666667 
2.718055556 
2.718253968 
2.718278770 
2.718281526 
2.718281801 


The next two procedures will do an addition of the first 10 positive 


integers and then a multiplication of the same integers using the Do loop 
command: 


sum = 0; 
Do[sum = sum + k, {k, 1, 10}]; sum 


55 
Suml[k, {k, 1, 10} 
55 


prod = 1; 
Do[prod = k prod, {k, 1, 10}]; prod 


3 628 800 


55 


10! 
3 628 800 


prod = 1; 
Do[prod = prod k, {k, 1, 10}]; prod 


3 628 800 


Sometimes the most convenient way to create a somewhat complicated 
function is to use a procedure for its rule. We will first look at two simple 
examples that create the factorial function and the add function using the 
procedures of the previous section to do it. Remember to place 
parentheses around all of the steps in your procedure. The output of the 
function will be the result of the last step of your procedure, which is the 
function’s rule. The variable n in both the factorial and add functions 
below takes on only a positive integer value: 


factorial[n_] := (prod = 1; Do[prod = k prod, {k, 1, n}]; prod) 
factorial[10] 


3 628 800 


add[n_] := (sum = 0; Do[sum = sum + k, {k, 1, n}]; sum) 
add[10] 


55 


Now we turn to a more sophisticated example involving piecewise 
functions and the use of the nested If command. Mathematica has a 
Piecewise command to build piecewise functions, but it is not always 
sufficient or convenient for all purposes. Let’s create a function called 
quadrant that takes a point in the plane and tells us which quadrant it is in 
or if it is on an axis and thus is not in any quadrant. Note that && is the 
Mathematica logical AND for joining two conditions so that both must be 
true for them together to be true. The logical OR is || with no space 
between the two vertical bars. The Return command used below will 
return an output as well as terminate the If, While, For or Do loop that 
you might be in. There is also a command Break which terminates loops. 


56 


Be very careful in Mathematica to avoid using the letters C, D and N as 
variables in your functions or anywhere else as they are Mathematica 
command names. It is better to stick to using lowercase letters as your 
variable names and output for any function or procedure: 


quadrant([pt_] := If{pt[[1]] > 0 && pt{[2]] > 0, Return[quadrant1], 
If[pt[[1]] < 0 && pt[[2]] > 0, Return[quadrant2], 


If[pt[[1]] < 0 && pt[[2]] < 0, Return(quadrant3}, 
If[pt{[1]] > 0 && pt[[2]] < 0, Return|quadrant4], axis]]]] 


quadrant[{—3, 9} 


quadrant2 
quadrant[{0, 9}] 


axis 


57 


Chapter 2 


Linear Systems of Equations 
and Matrices 


2.1 Linear Systems of 
Equations 


The basic idea behind a general linear system of equations is that of a 
system of two xy-plane line equations ax + by = c and dx + ey =f, for real 
constants a through f. Such a system of two-line equations in the xy-plane 
has a simultaneous solution, the intersection set of the two lines. In 
general, we consider a simultaneous solution to be the collection of points 
that simultaneously satisfies all equations in a given system of equations. 
In the two-line equation system example we have only three possibilities. 
If the two lines are not parallel, then their intersection set is a single point 
P; hence the simultaneous solution will simply be {P}. If the two lines are 
parallel and distinct, then their intersection set is empty and there is no 
simultaneous solution. Last, if the two lines are the same, the simultaneous 
solution is simply the entire line, which consists of an infinite number of 
points. We now give an example of each of these three situations. As you 
will see, the solution points of each of these three systems are where each 
pair of lines intersect. In these systems, in order to solve by hand for the 
solution points, we must try to solve for one of the variables x or y by 
itself without the other variable. In essence, we are assuming each pair of 
lines intersect at a single point unless we discover otherwise. 


Example 2.1.1. Our first example is to solve the following system: 


58 


oz +2y=9 


In order to solve for one of the variables x or y by itself without the other variable, we can 
multiply the first equation by 3 and multiply the second equation by 2, which makes the 
coefficients of y negatives of each other. Now, we can add the two equations together 
eliminating the variable y, and so solving for x. Similarly, we can solve for y alone by 
multiplying the first equation by 8 and multiplying the second equation by 5 and then 
subtracting the second equation from the first: 


3(5a + 2y = 9) 8(54 +2y = 9) 
+2(8r — 3y = —4) —5(8r2 — 3y = —4) 
3lz + Oy = 19 Or + 3ly = 92 
19 92 

om y= = 


These two equations tell us that the system of two lines intersects at the point 


19 92 ) 
( 31° 31/ So the solution to this system is a single point. Now, we want the Solve 
command of Mathematica to give us the same result. 


Solve[{5x + 2y == 9,8x —3y == —4}, {x, y} 


(«> p> i} 


Last, for this example we plot these two lines and see that they intersect at this point, as 
depicted in Figure 2.1. The command ContourPlot will plot implicitly defined equations, 
or systems of implicitly defined equations, in two variables: 


ContourPlot([{5x+2y == 9,8x—3y == —4}, {x, —5, 5}, {y, 0, 5}, 
ContourStyle-+{{Red, Thickness[0.01]}, {Blue, Thickness[0.01]}}, 
PlotRange— All, Axes—+True, Frame—+False, AspectRatio—+2/3] 


Figure 2.1: The intersection of system (2.1) is the single point 


(31> St). 


39 


Example 2.1.2. In our second example we solve the following system: 


52+ 2y=9 
Notice that the first equation for system (2.2) is the same as that of system (2.1). 


Applying the same ideas of Example 2.1.1, we multiply the first equation by 2 and then 
subtract the equations: 


2(5x + 2y = 9) 
—(10r2 + 4y = —7) 


Or + Oy = 11 
0=11 


The equation 0 = 11 is impossible, and so our system of two lines has no solution. This 

should not be surprising since these two lines are not identical, but are parallel since they 
5 

have the same slope m =— 2. The Solve command gives no result indicating that the 

system has no solution or intersection point. This situation is illustrated in Figure 2.2. 


60 


Solve[{5x + 2y == 9, 10x + 4y == —7}, {x, y}] 
{} 


ContourPlot({5x+2y == 9, 10x+4y == —7}, {x, —5, 5}, {y, 0, 5}, 
ContourStyle—+{{Red, Thickness[0.01]}, {Blue, Thickness[0.01]}}, 
PlotRange— All, Axes—True, Frame-—+False, AspectRatio—1/2] 


Figure 2.2: The parallel lines of system (2.2) have no solution set. 


y 


So we have done an example of two lines with one intersection point, two 
lines with no intersection point, and all that is left is to do an example that 
results in an infinite number of solutions. 


Example 2.1.3. Now, we consider the following system of equations: 
5r + 2y=9 


(2.3) —5z — 2y = —9 


On inspecting system (2.3), it should be clear that the line equations are really the same 
since the second is the negative of the first. Thus, there is truly only one equation in this 
system, the first equation. So the solution points of this system are all of the points that lie 
on this first equation’s line. Notice that if we add the equations of this system, we get the 
equation 0 = 0, which tells us that one of our equations is not needed. 


61 


Solve[{5x + 2y == 9, —5x — 2y == —9}, {y, x}] 


Solve::svars : Equations may not give solutions for all “solve” variables. > 


(E 


The Solve command has told us that the solutions of this system of two lines are all of the 
5 9 

points (x, 2x + 2), as x varies over all real numbers. In this solution x is the variable 

that determines the solutions as x changes its value. This is an infinite number of solution 

points, one for each value of x. 


The situation for a system of two lines in the xy-plane turns out to be 
typical for general linear systems, where simultaneous solutions contain 
no points, a single point, or an infinite number of points. Next, we 
introduce some definitions that will help us to understand the more 
general situation. 


Definition 2.1.1. A scalar is defined to be a real or complex constant. 


In linear algebra, a scalar is essentially a synonym for a constant, and this 
terminology originally was used in physics to distinguish between 
constants versus vectors. For simplicity, and so that we can plot our 
results, we shall generally assume that a scalar is a real constant. 
However, sometimes we shall take our scalars to be complex, since 
real-life problems usually need only real scalars, but not always, as we 
will see in Chapter 12 with eigenvalues and eigenvectors. Recall that the 


complex numbers Č are an extension of the real numbers R where 


C={a+bila,beER, i? =-1} 


The complex numbers € were created in order to solve algebra problems 
(e.g., solving polynomial equations and factoring polynomials where their 
coefficients are all real). 


For clarity, we shall often say real scalar, although you should not be 
surprised if we use or say complex scalar. The basic facts of how linear 
systems of equations work do not depend on whether our scalars are real 
or complex, as we shall see throughout the rest of this book. 


Definition 2.1.2. A linear equation in the variables x1, x2,..., Xn is of the 
form 


62 


aiti + azta +-+- + anin = b 


for scalars a1, a2,..., an and b. If the scalars in a linear equation are all real 
numbers, then the values of its variables are also all real, but if at least one 
scalar is a complex number, then the values of the linear equation’s 
variables are also complex. 


These definitions help us generalize the equation of a line in the xy-plane 
with variables x and y. If a linear equation has only three variables, then 
they are usually denoted by x, y, and z with the linear equation written as 
ax + by + cz = d for real constants a through d. The plot of such a 
three-variable linear equation is a plane in space with coordinates x, y, and 
z (see, e.g., Fig. 2.3). Now, let us have Mathematica plot the three variable 
linear equation 5x — 2y + 7z = 15 in order to see that this is true. 


f= 5x— 2y + 7z == 15; 


ContourPlot3D[Evaluate[f], {x, —7, 7}, {y, —7, 7}, {z, —7, 10}, 
Mesh-+None, ContourStyle—+Red, AspectRatio—1/2] 


Figure 2.3: The plane defined by f. 


Now let us discuss why the plot of the linear equation ax + by = c is a line 
(one-dimensional) while the plot of the linear equation ax + by + cz = d is 
a plane (two-dimensional). The reason lies in the fact that the equation ax 
+ by = c has one independent variable x and one dependent variable y; 
that is, it can be written as oe pT + b, and the number of 
independent variables in an equation is the number of degrees of freedom 


63 


oie ca c 
in expressing the equation’s solution points (z,y) = (z, prt 2) 
Similarly, the equation ax + by + cz = d has two independent variables x 
and y, while only one dependent variable z, that is, it can be written as 


z=—fr-2y+4 
= e> ©, and the number of independent variables in an 


equation is the number of degrees of freedom in expressing the equation’s 
a b d 
. (BY, 2) = (2, y, -58 — y+ =} .. 
solution points (2,952) ( oye av Ta ) Since the dimension 
of a geometric object is really the number of degrees of freedom in 
expressing it, the equation ax + by = c has dimension one and is a line 
while the equation ax + by + cz = d has dimension two and is a plane. 


Definition 2.1.3. The dimension of a geometric object is the minimum 
number of degrees of freedom needed to express the equation’s solution 
points. Equivalently, the dimension of a geometric object is the minimum 
number of independent variables it takes to describe uniquely all of the 
points of the geometric object. 


By independent variables, we mean variables that do not depend on any 
other variables. By dependent variables, we mean variables that do 
depend on other variables. If we see an expression, such as z = f(x,y), z is a 
dependent variable, which depends on the two independent variables x and 
y. Now, it should be clear that a general linear equation 


(2.4) aT + a222 +++ + anTn =b 


has dimension n — 1 as a geometric object in n-dimensional space R”, For 
example, R? is two-dimensional space and is often pictured as the 
xy-plane while R? is three-dimensional space and is pictured as xyz-space. 


Definition 2.1.4. R” is the collection of n-tuples of all real numbers: 


R" = {(2452%9;-.:;2n) se E R for k = 1,2;..2, 1} 
Definition 2.1.5. C ” is the collection of n-tuples of all complex numbers: 
C" = {(¢),%9,...,2n) |e, €C fork = 1,2,...,2} 


From these two definitions, we can now see that points in R" of the form 


64 


ay a2 An—1 b 
T1,02,---,;2n—-], ~~~ T] — —T2 A a In-1 + — 


bi n an ün 


satisfy the general linear equation (2.4). If an = 0, then we simply solve for 
Xk, where k + n and ax £ 0, with the resulting point still being a point of R 
n 


Now back to discussing linear systems of equations and their simultaneous 
solutions. From the information above, a line is one-dimensional and the 
intersection set of two lines is usually a point, which is zero-dimensional. 
We have already discussed the situations in which the intersection of two 
lines is not zero-dimensional, so we will not repeat ourselves here. If 
instead, we consider three random lines in the plane, then there is a high 
probability that the intersection set of these three random lines is empty. 
However, it is possible that the intersection set could be of dimension one, 
or zero, as well. The intersection set of a line and a plane is normally a 
single point, which is zero-dimensional. However, the simultaneous 
solution set could have dimension one, or even be empty. You should 
determine for yourself the required orientation between the plane and line 
for their resulting intersection to be one-dimensional or empty. 


As well, a plane is two-dimensional and the intersection set of two planes 
is usually a line that is one-dimensional. The intersection set could be the 
plane itself, which is two-dimensional, or empty, which has no dimension. 
The intersection set of three planes is usually a point that is 
zero-dimensional. One can think of this as first intersecting two planes, 
which results in a line and then intersecting the resulting line with the 
third plane, which is usually a single point. The intersection set could also 
be the plane itself, which is two-dimensional, a line that is 
one-dimensional, or empty with no dimension. The intersection set of four 
planes is usually empty and has no dimension, although the intersection 
set could be of any dimension two or lower. Hopefully, you see a pattern 
developing here. 


The general idea of this discussion is that in looking for the simultaneous 
solution or intersection set to a system of m linear equations, if each linear 
equation in the system is a geometric object of the same dimension n (i.e., 
n +1 variables), then the dimension of the solution set is usually (n + 1) — 
m=n-—m + 1, which is equivalent to stating that the dimension is simply 


65 


given by taking the number of variables minus the number of equations, 
although it can have any dimension from n down to none. Thus, a square 
system of r linear equations, where we have as many equations as 
variables in each equation, typically has a solution set of dimension r — r = 
0, and is a single point where the dimension of each equation is r — 1. 


Let us do a few examples to illustrate what we have discovered. Since we 
cannot plot beyond three dimensions, we stick to solving linear systems of 
equations in three variables so that each equation is a plane in R3. 


Example 2.1.4. We begin by plotting and solving the following linear system: 


(2.5) 


This solution set should be one-dimensional, which is a line since these equations are a 
pair of two-dimensional planes; that is, the dimension of the solution set is the number of 
its variables minus the number of its equations, that is 3 — 2 = 1. Figures 2.4 and 2.5 
illustrate the intersection of these planes and the corresponding solution to system (2.5), 
respectively: 


g = 3x + 8y — z == —4; 


TwoPlanes = ContourPlot3D[Evaluate[{f, g}], {x, —7, 7}, {y, —7, 7}, 
{z, —7, 17}, ContourStyle—+{Red, Blue}, Mesh—None, AspectRatio 
1/2) 


Figure 2.4: Plot of the planes fand g. 
5 
y “ 


66 


solnline = Solve[{f, g}, {x, y}][[1]] 


1 13 
{x + z (56-272), y+ = (-5+22)} 


plotsolnline = ParametricPlot3D[{x /. solnline[[1]], y /. solnline[[2]], 
z}, {z, —7, 10}, PlotStyle+{{Black, Thickness[{0.01]}}]; 


Show[TwoPlanes, plotsolnline, PlotRange— All] 


Figure 2.5: The intersection of f and g is the line solniine. 


The solution solnline to system (2.5), provided by Mathematical is one-dimensional and a 
line since it expresses the solution in terms of the single independent variable z for the 
two dependent variables x and y. This solution is clearly represented as such in Figure 
2.5. 


To solve system (2.5) by hand, we must first decide which variable we would like 
expressed in terms of the others. With our current system of two equations and three 
unknowns, we should be able to express two variables as a function of the third in the 
final solution. We will choose to express the x and y variables in terms of z. A very 
simply way to do this, is to move the z variable, along with its coefficient, to the 
right-hand side (RHS) of each equation. If we focus on the resulting left-hand side (LHS), 
then we can approach the problem just like we did in Example 2.1.1. To first solve for x 
in terms of z, we must cancel the y variables. To do this, we multiply the first equation of 
system (2.5) by 4, and add it to the second. To solve for y in terms of z, we multiply the 
first equation by 3, the second by —5, and add them: 


67 


4(5a — 2y = 15 — 72) 3(52 — 2y = 15 — 7z) 


3r +8y =—4+z —5(3r + 8y = —4 + 2) 
232 = 56 — 272 —46y = 65 — 262 
en. ee __ 6 18 
= 33° 23 Y= ~ 46" 23 


From the work above, we see that the solution is a line L in R defined as follows: 


56 27 65 13 
=f (3-32-23 Eas) zer} 


Clearly, the approach above to solving the system by hand can be 
generalized. If we have a system of m equations with n variables and m < 
n, then we keep only m variables on the LHS, moving the resulting n — m 
variables to the RHS and then solving the resulting system in terms of the 
m variables left. This will normally yield a solution with n — m 


independent variables, and m dependent variables. 


Example 2.1.5. Now let’s add another plane (Fig. 2.6) and see if we get a single point as 
solution. We now attempt to solve and plot the linear system: 


5r — 2y+7z= 15 
3r + 8y — z -4 
-9r + 6y + 102z = 7 


h = —9x + 6y + 5z == 7; 
soln = Solve[{f, g; h}, {x, y: z})((1)) 


109° > 218'* 109 


solnpoint = {x /. soln{[1]], y /. soln[[2]], z /. soln{[3]]} 


13 65 215 
(mae Gea) 


68 


ThirdPlane = ContourPlot3D[Evaluate[h], {x, —7, 7}, {y, —7, 7}, 
{z, —7, 17}, ContourStyle> {Yellow}, Mesh—+None]; 

solnpointplot = Graphics3D[{White, Sphere[solnpoint, 1] }; 
Show[{TwoPlanes, ThirdPlane, solnpointplot}, PlotRange-{{—7, 
7}, {-7, 7}, {-7, 7}}, AspectRatio—1] 


Figure 2.6: The intersection of the three planes is the single point 
solnpoint 


The point of intersection is difficult to see in Figure 2.6, however, if you enter these 
commands into Mathematics you can rotate the figure any way you wish and will be able 
to see the point of intersection more clearly. At this point, we would once again like to 
remind you of the Mathematica notebooks located on the book’s website, whose link is 
given in the Preface. This allows you to easily download the notebook containing all of 
the commands in this section so that you can rotate Figure 2.6 as much as you want. We 
do suggest that you type some of these commands in yourself in order to get used to 
Mathematica and its syntax. After all, watching a pianist perform may give you the idea 
of how to play the piano, but if you do not do any playing yourself, you will still be a 
lousy pianist. 


69 


Now we see one last example where Mathematica can solve the system, 
but we cannot plot the linear equations and their simultaneous solution 
because the dimension of each linear equation’s solution set is four since 
each equation has five variables. 


Example 2.1.6. Let us solve the linear system given by 


ör — 2y+7z—3w—t= 25 
3z + 8y—z+2w—Tt= —4 
—9r + 6y + 10z+w+it= 7 


Note that each of these linear equations represents a four-dimensional geometric object. 
Since we have three linear equations, we expect the simultaneous solution to be a (5 — 3 = 
2)-dimensional plane. 


Solve[{5x—2y+7z—3w—t == 25, 3x+8y—z+2w-—7t == —4, 
-9x+6y+10z+w+5t == 7}, {x, y, z}] 


1 (4 
{{x = =e (762 + 395t + 94w), 


5 (—159 + 128t — 63 w) 2 
aa se 655 + 56t — 62w)}} 


This solution has two independent variables w and ¢ with three dependent variables x, y, 
and z. So the dimension of the solution to this system is two. If we solve the system 
consisting of only the first two linear equations, then we expect to get a simultaneous 
solution that is (5 — 2 = 3)-dimensional. 


Solve[{5x—2y+7z—3w—t == 25,3x+8y—z+2w—7t == —4, 
—9x+6y+10z+Ww+5dt == 7}, {x, y} 


1 
{{x > 35 (96+11t+10w—-27z), y => 3 


5 gg (-95+32t—-19 w+ 262) } } 


This solution has three independent variables z, w, and ¢ with two dependent variables x 
and y. So the dimension of the solution to this system is three. The two solutions we get 
above do agree with our dimensional analysis for each linear system. 


All of the examples and equations that we have looked at thus far have 
been linear. In fact, most of the equations that we will look at will be 
linear. Linear equations are nice because they are easy to work with, and 
many ideas in one or two dimensions can be generalized to higher 
dimensions. On the other hand, most equations that occur in science and 
mathematics are nonlinear, and the last couple of Mathematica problems 
illustrate just a small sample of the various types of nonlinear equations. 


70 


Homework Problems 


1. Give conditions on two lines such that their intersection results in 
a simultaneous solution that is (a) of dimension zero, (b) of 
dimension one, (c) empty. 

2. Give conditions on a line and a plane such that the intersection of 
these two geometric objects results in a simultaneous solution that is 
(a) of dimension zero, (b) of dimension one, (c) empty. 


3. Give conditions on two planes such that their intersection results 
in a simultaneous solution that is (a) of dimension one, (b) of 
dimension two, (c) empty. 

4. Determine the dimension of the solutions to Mathematica 
problem 2. You do not have to solve the systems by hand. 


5. Solve by hand, for all of its intersection points (if any), each of 

the following linear systems of two lines: 

(a) 3a -2y =9 (b) — 2r — 10y = —2 (c) 7z + l4y = —21 
5r + 4y = —13 t+5y=9 z+ 2y = -3 


6. Solve by hand, for all intersection points (if any), each of the 
following linear systems of three planes: 


(a) 5z + 4y — 7z = -13 (b) —2e+4y—7z=-2 (c) —4@+3y+82=9 
32 —- 2y+z=9 z —6y+3z= 5r + 4y — 3z = —2 
2 -T — 2y — 4z = 15 6z +y—llz=-11 
7. Show that ax + by = c and dx +ey = f, for a through f real 
constants, are two parallel lines exactly when ae — bd = 0. As a 
consequence of this, these two lines intersect with dimension zero 
exactly when ae — bd £ 0. 
8. Show that ax + by = c and dx + ey = f, for a through f real 
constants, are two parallel lines exactly when there is some real 
number k where dx + ey = k(ax + by). 
9. Show that ax + by = c and dx + ey = f, for a through f real 
constants, has the single intersection point with coordinates 


(sath, SG). when ae-ba #0 


10. Show that ax + by = c and dx + ey — f, for a through f real 
constants, are two perpendicular lines exactly when ad + be = 0. 


71 


11. The definition of parallel planes is that they are either identical 
or they do not intersect. Show that ax + by + cz = d and ax + by + cz 
= e, for a through e real constants, are two parallel planes. 
12. Show that the two planes ax + by + cz = d and ex + fy + gz =h, 
for a through h real constants, have the line of intersection given by 


ee im? GO ea, ahed 
oo Puke a a-e af be 


for the independent variable z when af— be + 0. 


Mathematica Problems 


1. Use Mathematica to graph the required lines and planes that 
illustrate your answers to homework problems 1, 2, 3, 5, and 6. 

2. Solve for the intersections of the following equations, and graph 
the simultaneous solutions where applicable: 


(a) 3t +4y=4 (b) e+ 4y=1 (c)e+4y=1 
4x + 3y=8 r+y=1 2+2y=1 
—4r+y=-4 —4r+y=-l 
(d) 3z — 5y + 62 + 2w = 1 (e) 82 — 5y + 6z = (f)r+y+z=1 
4x + 38y —2z-—w=2 4x + 3y —2z =2 z-—yt+z=1 
xz—y+2z=-1 xz—2y+z=2 
(g) 3r — 3y + 6z =2 (h) 3z — 5y +6z+6t=1 (i)r-2y+z+2t=2 
z-2y+z=3 4z + 3y- 2z- 3t=2 z—-y+z+łt=1 
z — 2y +2z = -2 z—y+2z+2t= -1 z—-2y+z+t=2 
r+y+3z= e+ytz+t=1 


3. Use the ContourPlot command to plot the following nonlinear 
equations: 


72 


(a) z? — Try + y? — 102 + 25y = 1 


(b) x? — Try + 15y? — 10r + 25y = 1 


1 
(c) cos(3z — y) — x —sin(x + y) = 2 


4. Use the ContourPlot3D command to plot the nonlinear 
equations: 


(a) (x — 9)? + (y — 2)? + (z +5)? = 144 
(b) (z — 9)? + (y — 2)? — (z +5)? = 144 


(c) (z —y +z)? — r? +sin(z) = y 


5. Use the Solve command to verify homework problem 9. 
6. Use the Solve command to verify homework problem 12. 


2.2 Augmented Matrix of a 
Linear System and Row 
Operations 


A general linear system of equations is always written so that each linear 
equation uses the same variables given in the same order. This makes it 
easier to write down the system, as well as to solve the system of 
equations simultaneously. 


Definition 2.2.1. A general linear system of m equations in n variables x1, 
X2,..., Xn is of the form 


173 


Q,121 + 41,272 +`- + alnn = bı 
be 


Q2,121 + 02222 +° + A2.nTn 


üm,1T1 + âm,2T2 + +++ + am, nTn = bm 


The aij terms here are called the coefficients of the linear system, where 
the subscript i tells you which equation you are in and the subscript tells 
you which variable it multiplies. The mathematical construct A, formed by 
the aij terms, is called the matrix of coefficients of this linear system, 
where the subscript 7 now tells you which row you are in and the subscript 
j tells you which column. The matrix A, of coefficients, is of size 
(dimension) m x n (read as “m by n”), where m is the number of equations 
and n is the number of variables in the linear system. The array B formed 
by the b; is a column matrix of size m x 1 and is called the RHS matrix of 
the system. The array X formed by the x; variables is a column matrix of 
sizen x 1. 


This system of m linear equations in n variables will generally have a 
simultaneous solution that has dimension n — m as a geometric object in 
n-dimensional space R” with the variables x1, x2,..., xn. In many practical 
situations where linear systems and their simultaneous solution set play a 
role, it is normal for the number of equations m to be equal to the number 
of variables n. When m = n, the simultaneous solution set to the linear 
system usually consists of a single point, since its dimension is typically n 
-m = 0. When m > n, the number of simultaneous solutions is usually 
none; when m < n, the number of simultaneous solutions is usually 
infinite, which is backed up by our dimensional analysis. 


Before we go any further, we will introduce the definition of a matrix, 
since we will be using the word quite frequently. 


Definition 2.2.2. A matrix, or two-dimensional array, is a collection of 
real or complex numbers arranged in rows and columns. If the entries of a 
matrix A are real numbers, and there are m rows and n columns, we say 
that A e R”””; likewise, if the matrix B has complex entries, we say that 


Be ker. 


74 


A matrix with m rows and n columns is said to be of size m x n, which is 
read “m by n.” The following is the general structure of a matrix with m 
rows and n columns. Pay special attention to how the entries in the matrix 
are indexed: 


1 a2 41,3 see Gian 

a21 a2,2 a2,3 ants Q2n 
A — 

Am,1 üm,2 Om 3 *'* am, 


If there is only one column, then n = 1 and the matrix is called a column 
matrix. Similarly, if m = 1, the matrix is referred to as a row matrix. 
Column matrices will play an important role in vector algebra later on in 
the text. 


The following definition of the augmented matrix will allow us to place 
into a single matrix all of the information contained in a linear system of 
equations. As such, we will be able to use this augmented matrix for a 
linear system to find all of the system’s simultaneous solutions without the 
need to carry around the superfluous variable symbols of the system and 
its equal signs. 


Definition 2.2.3 If A is an m x n matrix with entries aij, and B is m x k 
matrix with entries b;j, the augmented matrix (A|B) is the m x (n + k) 
matrix defined as follows: 


Qi 1,2 41,3 > Qin ba biz big ss bik 

a21 A22 23 -= Azn bai b22 bas ++ bop 
(A|B) = , 

Om, Om,2 Am,3 e Qmin bm, bm,2 bm.3 MEES Dnk 


Let us do an example of forming the coefficient matrix A and the RHS 
matrix B for a linear system as well as the augmented matrix of the system 
(A|B), which is the column B written in after the last column of A. The 
Mathematica command Join will join the matrix B to A. Be sure to 
include the extra option 2, so that Mathematica knows to augment as extra 
columns; otherwise it will attempt to add the matrix as extra rows. 


75 


Example 2.2.1. We will use the following linear system 
or — 3y = 9 
22+7y= —4 

and find the augmented matrix of this linear system. 


(A = {{5, —3}, {2, 7}}) // MatrixForm 
5 -3 

(2 7) 

(B = {{9}, {—4}}) // MatrixForm 


(<4) 


Join[A, B, 2] // MatrixForm 
5 -3 9 
2 7 -4 


This augmented matrix for the system of linear equations represents all of 
the information of the original system and so can be used to solve the 
system without using variables or equations. Each row of an augmented 
matrix for a linear system represents an equation of the system, while each 
column except the last one is a variable coefficient column. The last 
column of an augmented matrix always consists of the values from the 
RHS of the equations. 


Example 2.2.2. Let C be the following augmented matrix of a system of linear equations: 


6 1 0 4 -3 
C=|-9 2 3 -8 1 
70 -4 5 2 


76 


This system has three equations represented by the rows of C and four variables called x, 
y, z, and w represented by the first four columns of C. This augmented matrix C 
represents the linear system 


6z + y + 4w = -3 
—9x + 2y + 3z — 8w 
Tz — 4z + Sw = 2 


Il 
— 


(2.6) 


Now, the question is: How can these augmented matrices be used to solve the underlying 
linear system of equations for their simultaneous solution? Let us take the augmented 
matrix C of the last example. We know that its solution set probably has dimension 4 — 3 
= 1, which says that we should be able to write the solution in terms of the single 
independent variable w with dependent variables x, y, and z. We defined w to be the 
independent variable since it corresponds to the last column of variables in the augmented 
matrix. You may find that for a particular problem it is easier to treat another variable as 
the independent variable. However, in a case like that, you can simply switch columns so 
that the correct variable is the second-to-last column in the matrix; just be sure to 
remember which column corresponds to each variable. Now, back to our problem. We 
want to manipulate the augmented matrix C for the original system into the new 
augmented matrix 


1 0 0 di4 dy 5 
D=|0 1 O doq dos 


0 0 1 dz 4 dz 5 
(2.7) 


which is equivalent to the original augmented matrix C (same solution set) from which 
we can read off the solution 


(2.8) 
{z = —d\aw+d,5, y= —do4w+do5, z = —d3,4w + d3,5} 


to the original system. Be sure to pay attention to the way in which the matrix D 
corresponds to the solution given above. 


The main question we need to ask is: Why do these operations not change the equations? 
Given a linear equation of the form (2.4), notice that multiplying both sides of the 
equation by a nonzero constant a, resulting in 


n one eee *.\= ad 
(2.9) a (atı + agr2 + + A,2,) =a 


does not change the solution to the original equation, since we could later divide by a to 
retrieve the original equation. Similarly, given two equations 


77 


4,21 + A272 +`- + anTn = b 


d 


Ci) 21 + C22 + +++ + Cnn 


the sum of the two equations is 


(a, + c1)zı + (a2 + €2)t2 +-+- + (an +Cn)tn = b+d 


We can retrieve either equation from this sum by subtracting either equation from the 
sum, and so replacing one of the two equations with this sum is reversible and does not 
change the simultaneous solution set of the original system. 


Back to the problem at hand, when inspecting C, notice that in order to change the 6 at 
the top of the first column of C into a 1, we must multiply the first row of C by E Then, 
we will multiply the new first row by 9 and add it to the second row in order to change —9 
to 0. The rest of the row operations are given below to get the final augmented matrix of 
the system in the form of equation (2.7). There are no built-in commands to manipulate 
rows in the manner we need; however, we can manually change rows as desired. Be sure 


you understand all of the following operations and matrix definitions as we progress 
through the process to the desired final augmented matrix form. 


(CM = {{6, 1, 0, 4, —3}, {—9, 2, 3, —8, 1}, {7, 0, —4, 5, 2}}) // 
MatrixForm 

6 1 0 4 -3 

-9 2 3 -8 1 


1 2 1 
1 è 0 § -2 
w 3g =8 I 
70-4 5 2 


(CM2 = {CM1[[1]], 9 CM1[[1]] + CM1[[2]], CM1[[3]]}) // Matrix- 
Form 


1 2 1 
1§ 0 3 -3 
7 7 
oi 3 -2 -] 
70-4 5 2 


78 


(CM3 = {CM2{([1]], CM2{[2]], —7 CM2[[1]] + CM2[[3]]}) // Matrix- 


Form 


g 

oof 
eh Niy Ol 

Loo 
w= ne wir 
tH chy ashe 
a a 


(CM4 = {CM3j{[1]], 2/7 CM3{[2]], CM3[[3]]}) // MatrixForm 


fio fa 
o 1 $ -4 l 
2-3 3 4 


(CMs = {—1/6 CMa{[2]] + CMa{[1]], CM4{[2]], CMa{[3}]}) // Ma- 
trixForm 


1 1 1 

1 0 -3 a 73 
6 4 

o 1 $ -4 -1 

T 1 11 

=a -4 3 @F 


(CM6 = {CM85{[1]], CM5[[2]], 7/6 CM5[[2]] + CM5[[3]]}) // Matrix- 
Form 


10-3 $ -4 
6 4 
o1 9 -§ -1 

1 13 
0 0 -3 -} B 


(CM7 = {CM6|[1]], CM6[(2]], —1/3 CMe6[[3]]}) // MatrixForm 


10-) # -} 
6 4 

3 

001 2 -# 


(CM8 = {1/7 CM7[|[3]] + CM7([1]], CM7[[2]], CM7[[3]]}) // Matrix- 
Form 


79 


(CM9 = {CM&{[1]], -6/7 CM8[[3]] + CM8{[2]], CM8[[3]]}) // Ma- 


trixForm 
7 3 
100 3 -# 
2 5 
1 13 


The matrix CM9 given above is the augmented matrix of the new [but equivalent to (2.6)] 
linear system with equations: 


7 34 
2+. 


9 
2 


=w 


-3 


63 
5 


21 


13 


(2.10) 9 


If you followed the steps, you will notice that a value of 1 was obtained in the upper-left 
entry CM1,1 by multiplying by the reciprocal of the entry. Notice the rest of the row 
changed accordingly, just as an equation would change if you multiplied everything by a 
scalar. The next step in the process was to make every entry in the matrix below CM1,1 
equal to zero. This is done simply enough after the upper-left entry has been changed to a 
1. (Make sure you understand why this is true.) Next, we move to entry CM2,2 and 
attempt to make it a 1. After that has been accomplished, every other entry in that column 
must be made to be zero. Finally, something similar is done with the third column and the 
entry CM3,3. The resulting matrix says that the solution to our original system is 


This method of reducing the matrix to find the final answer will be explained in more 
detail in Chapter 3. Let us now check, using the Solve command on the original system, 
to see if we get the same solution. This solution is clearly of dimension one as expected, 
with one independent variable w expressing the solution using the three dependent 
variables x, y, and z: 


Solve[{6x + y +4w == —3, —9x + 2y + 3z — 8w == 1, 7x—4z+4 
5w == 2}, {x, y, z}] 


{{x 3 a (-34—49w), y > a (+14), z> 5 (=13-w)}} 


The two solutions are determined to be the same using the row operations on the 
augmented matrix of the system or using Mathematica’s Solve command. 


80 


A similar approach can be used on complex-valued matrices, as we show 
in the next example. The situation of a complex linear system is the same 
as the real case of the last example, except that now our variables can take 
on complex values. 


Example 2.2.3. Consider the following system of equations: 
(2.11) 
2z — 3iy — 4z = —1 + 2i 
(2 — 2i)z + 3y + (1 — i)z = —2i 
(10 — 4i)x + (6 — 9i)y + (—10 — 2i)z = —3 + 2i 


Using the same approach as that of example 2.2.2, we put the system of equations into a 
matrix of dimension 3 x 4 with complex entries. We will attempt to get values of 1 along 
the diagonal from the upper-left to the lower-right corner, and zeros off the diagonal of 
the left three columns of the matrix without swapping any rows. Note that in the 
Mathematica code to follow, we have combined two row addition commands in certain 
spots to save space. Also, you should note that Mathematica uses the letter 7 (and D for 


the complex number written as ż in the text of this book: 


(B = {{2, —3 I, —4, —1 + 2 I}, {2 — 2I, 3, 1 — I, —2 I}, {10 — 41, 
6 — 9I, —10 — 21I, —3 + 21}}) // MatrixForm 


2 —3i -4 -l +2i 
2-2i 3 l-i -2i 


10—4i 6—9i -10-—2i -342i 
(B1 = {1/2 B[[1]), B[[2]], B[[3]]}) // MatrixForm 


3i 1 é 

1 E? —2 ei ea 
2—2i 3 1-i —2i 

10—4i 6-91 -10-2i -3+2i 


(B2 = {B1[[1]], (21-2) B1[[1]] + B1[[2]], (41-10) B1[[1]] + B1[[3]]}) 
// MatrixForm 


1 -ł -2 -4 +i 
0 6+3i 5-5i —1—6i 


0 12+6i 10-101 —2-10i 


81 


(B3 = {B2[[1]], 1/(6 + 31) B2{[2]], B2[[3]]}) // MatrixForm 


3i 1 
1 i =2 -} + 
1 Fy 7 
0 1 l-i -3- 
0 12+6i 10-10i ~2—10i 


(B4 = {3/2 I B3[[2]] + B3[[1]], B3[[2]], —(61+12) B3[[2]] + B3{[3}]}) 
// MatrixForm 


1 i 2 3i 
-3+3 što 
: Fou EE 


-I5 5 
0 0 


oom 
oro 
cole 
| 
= 


This new augmented matrix B4 gives the linear system 


+ ee ne. 
te 5 +58) v= i 


PE (a | ne ae? 
wre eS 


(2.12) 0=0 


which has the same solution set as the original augmented matrix and its linear system as 
given in (2.11). The equation 0 = 0 tells us that one equation of the original linear system 
is superfluous and does not help us determine the simultaneous solution set of the system. 


Solve[{2x—3ly—4z == —142I, (2—21)x+3y+(1-—D)z == —21, 
(10 — 41)x + (6 — 9I) y + (—10 — 21I)z == —3 + 21I}, {x, y}] 


{{x (-5 e :) ((-2+i)+(1+3i)2z), y > (-5 + :) ((-2+38)+52)}} 
Expand([%[[1)]] 


{f+ > (5+ io) *(a-2) #99 (as )-(5-) 


From the final matrix, or the Solve command output given above, we see that the solution 
to the system of three equations with three unknowns given in equation (2.11) has an 
infinite number of solutions. The solution can be expressed with the x and y variables in 
terms of the z variable as follows: 


82 


Homework Problems 


1. Given an augmented matrix, describe how to determine whether 
it is in the simplest form possible for finding the solution to the 
system of equations from which the original augmented matrix was 
constructed. 

2. Given an augmented matrix, describe how to determine on 
inspection whether the system of linear equations it represents has 
no solution, or an infinite number of solutions. 


3. Explain how to find the equation of a line through two points (x1, 
y1), and (x2, y2) and a plane through three points (x1, v1, Z1), (x2, y2, 
z2), and (x3, v3, Z3) using linear systems. 

4. Convert each of the following systems to matrix form and 
determine to what set the resulting matrix belongs: R” or C”, 
for specific values of m, n: 


(a) 3a+4y=4 (b) r+4iy=1 (c) c+4y=1 
4r + 3y=8 r+y=i r+2y=-3 
—4r+y = —4i —4r-+-y=-1 
(d) 32 —5y+6z=1 (e) 3ix —5y+6z=1 (f£) z — 2y +z = 
4x + 3y -2z =2 4x + 3iy — 2z = z-y+z=1 
2—yt+ 2iz=-i r+ytz= 
(g) c+y+3z= (h) 3z —5y+6z+6t=1 (i)r—2y+2z+2t=2 
z—2y+z=3 4z + 3y — 2z- 3t=2 r—-y+z+t=1 
z—2y+z= -2 z—y+2z+2t=-1 r—-2y+z+t=2 


ctytz+t=1t 


5. Convert the following systems to matrix form and then reduce 
each to its final augmented form using the row operations discussed 
in this section. Explain each step and show the modified augmented 
matrix at each step. 


83 


(a) 2x — 3y = —5 (b) -zr+y-z=-—2 
3r+Ty=4 r+y+2z2=6 
z+2y—z2=2 


(c) Tx + 5iy — 3z = —36 + 39% (d) — 6z + 9y + z — 2u + w = —8 
siz — 4y + z = —26 — 2li år — 3y + 9z +u -v= 13 
22 + 3y -4z -u-v = —5 


(e) 10x + 15y — 72 = —35 (f) -3r +y- 2iz=2+4i 
-$r + 5y- dz = -1 z—-y+z= 
35a — 35y + 7z = —96 Tt Iy = sed at 


6. Using the final augmented matrices from the previous exercise, 
express the solutions to the original systems in set notation using 
the given variables. 


7. If the following augmented matrices represented a system of 
equations now in reduced form, express the solutions in set 
notation. 


R E, a roa S 4 1 0 
(a) |0 10 -6/(b)}0 10 -6 O|(c)]0 1 
001 1 ; 0 


Mathematica Problems 


1. Verify your answers to homework problem 3. 

2. Solve the system of two lines given by ax + by = c and ex + fy = 
g. 

3. Solve the system of two planes given by ax+by+cz = d and 
extfytgz =h. 

4. Solve the system of three planes given by ax + by + cz = d, ex + 
fv + gz=h, and ix + jy + kz = l. 

5. Use the Solve command to find the solutions to the following 
systems: 


84 


(a) — 2w + 3r + y + 5z = 
w — 2r + 3y- z= 
6w — z — ôy - 3z = 

—3w — 3z + 2y +2 =0 


(c) — 9w + 3x + 4y —6z =3 
w — 5r + 3y + 7z = -8 
5w — 8r +y -5z =3 


(e) — 9w + 3r + 4y -6z =3 
w—5r+3y +7z= -8 
Sw — 8r +y—5z=3 
6w + x + 9y + 13z = —15 
-2w + 2r- y +2=7 


(g) (2 — i)w + 22 — 3y + 5z = 6i 
2w + 32 — y + 2z = 4 
2w + Tx — 2y — 9z = —5 


(b) — w + 4r + 8y +9z +u-v= -3 
2w -zr+5z—-u-v=4 


(d) — 9w + 3r + 4y- 6z = 3 
w — 5r + 3y + 7z = -8 
5w — 8r +y—5z=3 
6w +x + 9y + 13z = —-16 


(£) 13v — l6w + 22 — 3y + 6z = 2 
v— 5w + 3x + Ty -— 8z = 
3v + 4w — 6x + 3y + 5z = —8 
v— 5w +3r+6y+z=9 
2v — 3w +r — y +7z=3 


(h) 2v — (5 + i)w + 3x + 6y + Tiz = 8 
3v + Tw + 2x + (6+ 3i)y = 6 


3v + (5 + 2i)w + 2x + Ty + 5z = —4i 


2iz + 5y —13z =5 8v + 9w + (2 — 3i)y +4z = 2 


v+w+(3+2)2+6y+7z=8 


6. Redo Mathematica problem 5 without swapping rows on the 
augmented matrix of each system. What is the dimension of the 
solution set to each system? 


2.3 Some Matrix Arithmetic 


In this section, we want to describe how to add, subtract, and multiply two 
matrices, as well as multiply matrices by scalars. Division of matrices will 
be postponed to a future section, since it is not obvious how to do division 
of matrices or even if it can be done at all. 


If we think of a matrix as analogous to a spreadsheet of data, then adding 
or subtracting two matrices should correspond to adding or subtracting the 
data in two spreadsheets covering the same topic. This tells us that we 
should add or subtract two matrices only if they have the same size and 
then by adding or subtracting corresponding entries. Additionally, if you 


85 


want to multiply a matrix by a constant c, then multiply each entry of the 
matrix by c. 


Example 2.3.1. Some examples would now be useful. Consider the following two 
matrices: 


—2 5 1 9 6 -2 4 1 
a=] 3 0 —4 ITOE EE A 
Let us find A + B, A — B, -34A and -34A + 5B. 


(A = {{—2, 5, 1, 9}, {3, 0, —4, 7}}) // MatrixForm 
(era) 

(B = {{6, —2, 4, 1}, {—8, —1, 3, 5}}) // MatrixForm 
2 aa mgd 

(A + B) // MatrixForm 


4 3 5 10 
-5 -1l -1 12 
(A — B) // MatrixForm 
-8 7 -3 8 
11 1 -7 2 
—3 A // MatrixForm 


6 -15 -3 -27 
-9 0 12 -21 


86 


(—3 A + 5 B) // MatrixForm 


36 -—25 17 —22 
-49 -5 27 4 


Definition 2.3.1. Let 4, B c R””” and c e R, the sum 4 + Be R”™™” 
with (4 + B)ij = Aij+ Bij. Similarly, the difference A — B e R™ with (A 
— B)ij = Aij — Bij. Finally, cA e R””” with (cA)i,j = cAij. 


Definition 2.3.2. Two matrices A, B e R”*” are equal if and only if A; j= 
Bij for all 1 <i <m, 1 <j <n. Symbolically, this is expressed as A = B. 


Both of these definitions are true if R is replaced by Č , as is the case for 
all of our linear system and matrix discussions past and future. 


Now we turn to matrix multiplication. The idea behind the definition of 
how to multiply two matrices is the desire to turn a system of linear 
equations into a single matrix equation of the form 4X = B, where A is the 
matrix of coefficients, X is the column of variables, and B is the column of 
RHS values. Let us look at the square system case that is 2 x 2 (two 
equations in two variables), corresponding to the following two linear 
equations: 


ar + by = aQ 
(2.13) &+dy=8 


The matrix of coefficients A, the column of variables X, and the column of 
RHS values B are given by 


wa [5] 


respectively. Now, the simplest way to write our system of linear 
equations as a matrix equation is 


87 


EARE 


since equality of matrices should mean that they have the same size along 
with the same corresponding entries. Thus, if we want the matrix equation 
to be AX = B, we must have 


fe ells }-Lete]-[3] 


Now we have our definition of matrix multiplication on the most basic 
level. This tells us that in order to multiply two matrices together, we must 
multiply a row of the left matrix times a column of the right matrix to 
obtain a single value that goes in the row, column location of the resulting 
product matrix. When a row of the left matrix multiplies a column of the 
right matrix in a product, you multiply corresponding entries together of 
the two and add the results. In other words 


(2.14) 


[a b] | z | = lar + by 


is the defining operation in matrix multiplication. Let us do a few 
examples of matrix multiplication, to see if we can find the correct pattern. 
Mathematica will perform the computations, and we will give the 
definition of matrix multiplication shortly. We start with a simple 
multiplication of two square matrices A, B € ae 


Example 2.3.2. Given 


oe a Eg 
as > | Se i] 


we wish to find both 4B and BA. In Mathematica, matrix multiplication is written as a 
period (.). 


88 


A {{3, 2}, {—2, 6}}; 
B = {{—1, 5}, {—2, 1}}; 
(A.B) // MatrixForm 


-7 17 
—=10 =4 


(B.A) // MatrixForm 


-13 28 
-8 2 


Notice that if we actually follow the rule given in equation (2.14), where we take a row of 
the left matrix, and multiply it by a column of the second matrix, we end up with the 
entries of the resulting matrix that Mathematica has given. For instance, if we take the 
first row of A, and multiply it by the first column of B, we get 3(—1) + 2(—2) = -7. This 
happens to be the entry in the first row, first column of the matrix AB. If we take the 
second row of A and multiply it by the first column of B, we get (—2)(—1) + 6(-2) = —10, 
which happens to be the second row, first column entry of the matrix AB. Do you see the 
pattern yet? Attempt to fill in the last column of AB, which corresponds to multiplying 

the first row and second rows of A by the second column of B. 


From the two matrix multiplications, we also know that AB + BA. This 
may seem a bit strange at first, but it makes sense if we consider that we 
always multiply the rows of the left matrix by the columns of the second 
matrix. This will be discussed further after Example 2.3.3. 


Example 2.3.3. Let us now try something a little more complicated. We want to find AB 
and BA, given 


6: “6 =3 
Fg 19 {a -1 6 
or SPA TTP | E E 

wW e i 


89 


A = {{—2, 5, 1, 9}, {3, 0, —4, 7}}; 
B = {{6, —8, —3}, {—2, —1, 6}, {3, 5, —9}, {-7, 4, 1}}; 
A.B // MatrixForm 


—82 52 36 
-43 -16 34 


B.A // MatrixForm 
Dot::dotsh : Tensors {{6, —8, —3}, {—2, —1, 6}, {3, 5, —9}, {—7, 4, 1}} and {{-2, 
5, 1,9}. {3,0,—4,7}} have incompatible shapes. > 


{{6, —8, —3}, {-2, -1, 6}, {3,5, —9}, {-7, 4, 1}}-{{—2, 5, 1, 9}, {3,0, —4, 7}} 


Notice that we can perform the multiplication AB, but not BA. In order to multiply BA, 
the number of columns of B must equal the number of rows of A, which is not true. Since 
the rows of B have three entries (i.e., B has three columns) and the columns of A have 
only two entries (i.e., A has two rows), we cannot perform the multiplication BA since the 
number of columns of B is not equal to the number of rows of A. However, since the 
matrix A is of size 2 x 4 while B is of size 4 x 3, we can perform the multiplication AB, 
which will result in a matrix of dimension 2 x 3. 


In general, if the matrix A has size m x k and the matrix B has size k x n, 
then we can multiply in the order AB with the matrix AB having size m x 
n. The only matrices that we can multiply in either order are square 
matrices of the same size. If both A and B are matrices of size n x n, then 
AB and BA both can be computed but are not normally equal, although 
both products are of size n x n. Thus, matrix multiplication is not a 
commutative operation even when both orders of multiplication can be 
done. 


Definition 2.3.3. Let A be a matrix of dimension m x k and B be a matrix 
of dimension k x n. We define C to be the product of A and B, denoted C 
= AB, of dimension m x n. Each entry of C can be computed by the 
following formula: 


(2.15) bai 


In simple terms, if we are to multiply two matrices A and B together and A 
is m x k and B is k x n, the resulting matrix C = AB will be a matrix of 
size m x n. Furthermore, the entry in the ith row and jth column of C is 


90 


found by taking the ith row of A and multiplying it by the jth column of B 
in the way described previously. This allows for a very systematic way of 
computing the product of two matrices. 


Example 2.3.4. Let us do another example where both AB and BA exist, but are not 


equal. Let 
—2 5 1 6 -8 -3 
As 3 0 B=]| -2 -1 6 
T -5 -8 3 5 —9 
then 
A = {{—2, 5, 1}, {3, 0, —4}, {7, —5, —8}}; 


5 
| 


ot {{6, —8, —3}, {—2, —1, 6}, {3, 5, —9}}; 
A.B // MatrixForm 


=19 16 27 
6 —44 27 
28 -91 21 


B.A // MatrixForm 


-57 45 62 
43 -40 —46 
-54 60 £55 


Note how tedious matrix multiplication is to perform when computed by hand with 
matrices of dimension greater than 2 x 2. 


Now, we know how to add matrices, subtract matrices, multiply a matrix 
by a scalar, and multiply two matrices together. All that is left is matrix 
division, or multiplication by matrix inverse. When we consider 
multiplication and division for real numbers, the unique multiplicative 


1— xl, with x x | = 1. If we consider a 


inverse of a nonzero number x is w= 


set of square matrices of size n x n (e.g., we could choose R” or C 
n*N) then there is a matrix called the n x n identity matrix, denoted by Jn, 


91 


which acts like the scalar multiplicative identity 1 under multiplication of 
these square matrices. 


We need to discuss identity matrices and matrix inverses because the 
matrix equation AX = B, which represents a system of linear equations, 
resembles the simple algebra equation ax = b for constants a and b. Now 
the equation ax = b can be solved for x if a +4 0, and then we obtain the 
unique solution x = b, We would like to do the same for our matrix 


equation AX = B, but this requires us to determine how to divide by 
matrices such as A. The identity matrix is the only square matrix with the 
property that if A is an arbitrary n x n matrix, then Al, = InA = A. We will 
make use of identity matrices later when defining the multiplicative 
inverse 4! of a square n x n matrix A, since if A! exists, we will want it 
to satisfy AA =A A= In; hence multiplying by A! will be the same as 
dividing by A. The identity matrices are built into Mathematica with the 
command IdentityMatrix[n], where n is the desired square dimension of 
the identity matrix. 


Definition 2.3.4. The n x n identity matrix, denoted In, is the matrix that 
has all ones on its main diagonal from upper left to lower right and all 
zeros elsewhere: 


Example 2.3.5. Our last example for this section will show us that 4/4 = 744 = A for the 
matrix defined below. Hence, we can think of the 4 x 4 identity matrix /4 as the 
multiplicative identity for 4 x 4 matrices. Finally, we take an arbitrary 4 x 4 matrix B, and 
multiply it on the left and right by 74: 


(Id2 = IdentityMatrix{2]) // MatrixForm 


(01) 


92 


(Id4 = IdentityMatrix{4]) // MatrixForm 


0 0 0 
1 0 0 
0 0 
0 1 


co cK 


1 
0 


(A = {{1, 3, 2, 4}, {-1, —6, 0, 7}, {3, —5, 1, 9}, {0, 8, 2, 1}}) // 
MatrixForm 


L 3 2 4 
=] -6 0 7 
3 -5 1 9 
© 8 21 


(Id4.A) // MatrixForm 


1 3 24 
-1 -6 0 7 
3 -5 1 9 
0 8 21 


(A.1d4) // MatrixForm 


1 3 2 4 
-] -6 0 7 
s -0 I 39 
0 8 2 1 


93 


B= Array|by4&, {4, 4}; 
(Id4.B) // MatrixForm 


bia bia bis bis 
b2, 1 be p) bə 3 be 4 
bs b32 b33 ba. 


b4, b4,2 b4,3 b4,4 


(B.Id4) // MatrixForm 


bıı bi2 biz bis 
b2, b22 b23 b24 
b3i bs2 b3 b34 
bai ba2 bag baa 


With the definitions and discussion of linear systems above, we are now in 
a position to prove the following theorem concerning the number of 
simultaneous solutions to a linear system. 


Theorem 2.3.1. A linear system of equations has either no solution, 
exactly one solution, or infinitely many solutions. 


Proof. Let AX = B be the matrix form of a linear system of equations for A 
its coefficient matrix, X its column of variables, and B the column of its 
RHS values. We have already seen examples of linear systems with no 
solution and with exactly one solution using pairs of lines as our linear 
system. All that remains is for us to show that if the linear system has at 
least two distinct solutions X1 and X2 with X1 # X2, then it must have 
infinitely many solutions. Note that AX] = B and 4X2 = B since X] and X2 
are solutions to the linear system. Let ¢ be any real variable. We claim that 
W = X1 +t(X2 — X1) is always a solution for any value of ¢. Note that when 
t = 0, then W = X 1 and when ¢ = 1, then W = X2. The variable W is 
actually the line through the two solutions X1 and X2, where you get one 
point on this line for each value of ¢. In order to see if the Ws are solutions 
to AX = B, we need only replace X by W and see if AW is B or not. Now, 
we obtain 


94 


AW = AX, +t A(X2—- Xı) 
= B+t(AX2— AX) 
= B+t(B-B) 
=B+t-0 
= B 


So all the Ws are solutions to AX = B, and there are an infinite number of 
Ws, one for each value of t. 


We conclude this section with a list of useful properties of matrix addition 
and multiplication. In the following table, a and b are scalars, while A, B, 
and C are arbitrary matrices. For each equation, remember that there are 
restrictions on the dimensions of the matrices that must be satisfied in 
order for the equation to make sense. The properties, however, will hold 
for any matrices for which the equation can be used. You will verify many 
of the following matrix arithmetic properties in the homework and 
Mathematica problems at the end of this section. 


95 


Basic Matrix Addition and 
Multiplication Properties 


Matrix A+B=B+A 
addition (A+ B)+C=A+(B+C) 
A+(-—A) =0 
Scalar a(bA) = (ab)A 
multiplication | a(A+B)=aA+aB 
(a+b)A=aA+bA 
Matrix (AB)C = A(BC) 
multiplication | A(B + C) = AB + AC 
(A+ B)C = AC + BC 
AI=IA=A 
A0 =0A=0 


a(AB) = (aA)B = A(aB) 
AB # BA in general 


Homework Problems 


1. Consider the following matrices: 


Perform the following matrix operations: 


96 


(a) A—2B (b) 5A -3C (c)2A +3B —4C (d)3(A—B+C) 
(e)2B+5C (f)D-3F+4E (g)6D+3F (h) 2E +3F 
(i)6D -4E +2F (j)2(D - 3E) +5F (k) B -4C +3A (1)2(6D —5F) 


2. Consider the following matrices: 


2 

y ay t-78 
a=] | B=] | cala 6 
2 2 2 23 a 
1-7 8 a 

1 42 

D=] 6 £5) B-|7% 7 3| pal ? 
EE > ağ 8 -2 
-2 8 -7 6 -3 


Determine which of the following matrix multiplications can be 
performed: 


(a) AA (b)AB (c)AC (d)CA_ (e) BB 


(f)FD (g)DA (h)BC’ (i)CF (j) DB 


3. Find all 12 possible combinations of matrices from problem 2 
that will allow matrix multiplication to be performed. 


4. Find a value of a for which the following two matrices satisfy AB 
= BA: 


am| 3 ij B=[2 2, 


5. Perform the following matrix multiplications: 


97 


2 -1 
D 2 ETL ss 
(e) a : ris 6) A fl |e d 
a + Belle a 


6. Write the following systems of equations in the matrix form AX 
=B: 


(a) 2z — 3y = 7 (b) 232 — 6y +42 = 2 
ár + 5y = 2 147 + 6y —5z = 4 
—5r + 4y = -1 


(c) -w+a—4y+3z=7 (d) — 9w + 3z + 4y -6z =3 
2w + 4y —5z=6 w — 5x + 3y + 7z = —8 
8w — 22 —7z=9 Sw —8r+y—5z=3 

6w + x + 9y + 132 = —16 


7. Consider the following three matrices: 


20 4 =E 1, 2 4 -2 9 
A=|ő 1 -2 B=|-6 4 1 C=|5 -8 0 
1 6 =T 70 2 2 t =i! 


Show that the following hold (assuming c € R): 
(a) A(BC) = (AB)C (b) A(B + C) = AB + AC (c)c(AB) = (cA)B 
(d) (A + B)C = AC + BC (e) A(cB) = (Ac)B (£) (AB)c = A(Be) 


8. For part (a) of problem 7, what are the general dimensions of A, 
B, and C such that one can perform the matrix multiplication? Do 
the same for the matrices of parts (b) and (c). 


98 


9. Letting c be a scalar and A be any 2 x 2 matrix, show that 


c 0 
CA = | 6 g E 


Generalize this to any-size matrix A so that scalar multiplication is 
turned into matrix multiplication by a diagonal matrix. 
10. Let A, B, C, and D be four 2 x 2 matrices. Let 


_ [A 0 feo 
K~| 4 A i=] $ Br | 


be two block diagonal 4 x 4 matrices with A, B, C, and D on their 
diagonals, where the zeros in these two matrices are the zero 2 x 2 
matrices. Show that 


s [AC 0 y Tæ 0 
KL=| “4 A i =|4 = | 


for any positive integer n. Generalize this problem where there is no 
restriction on the sizes of A, B, C, and D as long as they are square 
matrices, although you may want some of their sizes to be the same. 
Will this work for larger block diagonal matrices? 

11. Define the set M as follows: 


a b 
M = 
{ | -b a 
Show that the following are true: 
(a) For any two KL e M, KL = LK; that is, matrix 


multiplication in M is commutative. 
(b) Show that 


a b a —b = a? +b? 0 E 2 2 
F ANE: | =| 0 a? +b? = (a +b )h 


aber} 


99 


where / is the 2 x 2 identity matrix. 
(c) Show that 


oaa A Ble» 
TET 8 Eaa 


Hence 


[i o] 


is the square root of —/2. 


(d) Do these properties of M remind you of any other set and 
its properties? 
12. Let A, B € R??? be defined as follows: 


a b 1 d —b 
a= ai c= | 4 w| 


with ad — bc + 0. Show that AC = h and CA = h. What does this 
make C with respect to 4? 


13. Use the result from problem 12 to solve the linear system 
5x — Ty = 11 

9r + 2y = —4 

Check your answer by another means. 


14. Let your 2 x 2 linear system of equations be given as the matrix 
equation AX = B, with 


a[o elti -ala ] 


where ad — bc # 0. Using the result of problem 12, what is the 
solution formula for X? 


100 


Mathematica Problems 


1. Perform all 12 matrix multiplications that you found in 
homework problem 3. 

2. Verify your answers to homework problem 5. 

3. Compute 2, 2, Ate. .., A" for each of the following matrices: 


-2 0 0 Hw 3 
ol ® | 05 Of @ Joo -8 
0 0 «7 00 0 


4. Find a pattern to the sequence {A”}, n = 1,2,... for each of the 
following matrices: 


oF T il ee: -t t = 
(a)| 0 1 Of 6 O01 1] (@}] 11 -1 
1o 1 00 0 ti a 


5. Show that for arbitrary A, B, C € R4 and c e R, the following 
hold: 


eNOS 
“yooo 


(a) A(BC) = (AB)C (b) A(B + C) = AB + AC (c)c(AB) = (cA)B 
(d) (A+ B)C = AC + BC (e) A(cB) = (Ac)B (f) (AB)c = A(Be) 


6. Use Mathematica to find the formula for A7! given in homework 
problem 12. 

7. Use Mathematica to find the formula for X in homework problem 
14. 

8. Use Mathematica to check homework problem 10. 


101 


Chapter 3 


Gauss—Jordan Elimination 
and Reduced Row Echelon 
Form 


3.1 Gauss—Jordan 
Elimination and rref 


Gauss—Jordan elimination is a popular method for solving systems of 
linear equations. Examples 2.2.2 and 2.2.3 were actually examples of 
Gauss-Jordan elimination, which we now wish to formalize. 
Gauss—Jordan elimination uses three types of row operations in order to 
change the original augmented matrix of the system into a new augmented 
matrix whose form allows us to solve the system by simple inspection. 
The three types of row operations allowed in Gauss—Jordan elimination 
are switching or swapping any two rows, multiplying a row by a nonzero 
number, and finally multiplying a row by a nonzero number and then 
adding it to another row. These three row operations correspond to 
algebraic operations done on the linear system of equations that do not 
alter the solution set to the linear system, and they are the only operations 
that are needed to solve any linear system for its solution set. 


The final augmented matrix of Gauss—Jordan elimination is called a rref 
matrix, which is short for reduced row echelon form. Note that the rref 
form is unique, that is, rref is independent of the order of the original rows 
of the augmented matrix, or equivalently, independent of the order of the 


102 


original equations in the linear system. Every row of the rref matrix is 
required to be either all zeros or have a leading 1 in the row preceded by 
all zeros in its row. As you go down the rows of the rref matrix, the 
leading Is of the matrix must move left to right with no two of them 
occurring in the same column. Furthermore, any row of all zeros must be 
placed at the bottom of the final matrix in rref form. Further, each column 
of the rref matrix that contains a leading one must have all other entries in 
the column be 0. The following are all examples of matrices in rref form: 


LO 0 a l1aoOobed 
Ege 001lef 
001c 9 
100abe 1 b 0 0 c 
010def 00010d 
00 1gh i 0000it1.e 
000 0 0 0 00 0 0 0 0 
1 0abed 1 a6 ¢ oo 
Olefgh 000010 
0 0 0 0 0 0 00 0 0 0 1 
00 0 00 0 0000 0 0 


(3.1) 


The process of Gauss—Jordan elimination produces a rref matrix with the 
properties described above by working on one column at a time going left 
to right through the matrix and trying to obtain a leading | in place first as 
high up in each column as possible followed by zeros elsewhere in the 
column. The least used of the three types of row operations in 
Gauss—Jordan elimination is the switching of two rows. It is used only to 
obtain a nonzero value in a leading 1 location so that you can then turn it 
into a leading 1. In the preferred final form of the rref matrix in 
Gauss—Jordan elimination, the upper-left corner of the matrix is an 
identity matrix of a specific size. In Mathematical the command 
RowReduce performs Gauss—Jordan elimination. 


Example 3.1.1. First, let us repeat a previous example, specifically Example 2.2.2. We 
want to solve the linear system 


103 


62 + y + 4w = -3 
—9r + 2y + 3z — 8w = 1 
Tr — 4z + 5w =2 


The augmented matrix of this system is 


6 1 0 4 —3 
C=|-9 2 3 -8 1 
T 0 -4 5 2 


Our goal is to transform the matrix above to one of the following form: 


1 0 0 dig dis 
D=|0 1 0 dz4 dos 
0 0 1 d34 das 


Now we will use Mathematica to perform several operations used in Gauss-Jordan 
elimination. First, however, we will let Mathematica convert the system to rref form via 
the RowReduce command: 


CM = {{6, 1, 0, 4, —3}, {—9, 2, 3, —8, 1}, {7, 0, —4, 5, 2}}; 
RowReduce[CM] // MatrixForm 


Next, we swap rows 1 and 3 of C, and then use the RowReduce command. We also do 
the same with rows 2 and 3: 


RowReduce[{CM|[[3]], CM[[2]], CM[[1]]}] // MatrixForm 
100 4 -# 
010-% 3 
001 


=) 
| 
K| 


1 
9 


104 


RowReduce[{CM|[1]], CM[[3]], CM[[2]]}] // MatrixForm 


It is clear that the rref matrix is independent of the order of the equations, or equivalently, 
the rows in the augmented matrix C of the linear system. 


Notice that the 3 x 3 identity matrix appears on the left in the rref matrix above. This 
form of the final augmented matrix of a linear system is what Gauss—Jordan elimination 
tries to produce. This final augmented matrix says that the solution to our original system 
is 


34 2 5 1 13 
e E R o r a A A E e A a ~. E 


(3.2) 9 63 3 21 9 9 


This solution is written in terms of one independent variable w, and so it is 


one-dimensional, which is a line in R since we have four variables x, y, z, and w. We 
can also express the solution given in (3.2) as the sum of the two column matrices: 


—34 fe 

m 63 9 

5 2 

y 21 3 
= = } 
X d _B + a w 

9 g 

w 
0 1 


(3.3) 


This form will be used later when we discuss bases and dimensions of vector spaces. It 
should be clear that the solution expressed in (3.3) is one-dimensional, since there is a 
single parameter w multiplied by the second column matrix, and the first column matrix 
represents a single point, which is of dimension 0. Furthermore, if we express the solution 
given in (3.3) as X = Xp + Xnw, where 


4 aed 

63 9 

5 2 

21 3 

Xp = 3 | Aa= 1 
~ 9 ~9 

0 1 


then we get 


105 


6 1 3 6 1 
-9 2 3 -8|xX,=| 1], |-9 2 3 -8|Xr=|0 
70-4 5 2 PO 


The only unfortunate aspect of using the command RowReduce is that it does not 
actually give the solution to the original system. You must still write out the solution 
yourself from the rref matrix. On the other hand, the Solve command gives the actual 
solution to the system where you can specify which variables to solve for in the system. 
In this case, you know in advance that the dimension of the solution is most likely one 
and that of the four variables x, y, z, and w, you can probably solve for any three of them 
in terms of the remaining fourth variable: 


Solve[{6x + y + 4w == —3, -9x + 2y + 3z — 8w == 1, 7x— 4z + 
5w == 2}, {x, y, z}] 


BES OAE 1 _i3_w 
{{x> 5 (-34 49w) y> zi (5+14w), z> 5 ( 13-w)}} 
Solve[{6x + y + 4w == —3, —9x + 2y + 3z — 8w == 1, 7x— 4z + 
5w == 2}, {y, z, w}] 


1 1 aes 
{{y > g (11-42), z> g (-6747x), w > (-34-63x)}} 


Example 3.1.2. Let us do another example, similar to the first. We want to solve the 
slightly altered linear system 


62 + y + 4w = -3 
—9r + 2y + 3z — 8w 
7r-z+5w=2 


II 
— 


CM = {{6, 1, 0, 4, —3}, {—9, 2, 3, —8, 1}, {7, 0, —1, 5, 2}}; 
RowReduce[CM] // MatrixForm 


T 

1 0 “7 0 F 

o1 $ 0 -¥ 

00 0 1 -13 
Solve[{6 x+y +4 w == —3, —9x+2 y+3 z—8 w == 1, 7x—z+5w == 


2}, {x, y, w} 


{{x » TH y 5 ( 59-62), w > 13}} 


106 


From the rref matrix given above, we see that w =— 13 and that x and y can be expressed 
as functions of the single variable z. Hence, the solution is one-dimensional, with the 
solution given in the result of the Solve command above. In column matrix form, we 
would express this as follows: 


67 1 
7 7 
59 


pd 
Il 
E s 
\| 
_ 


€ 


5 

7 
0 1 
—13 0 


Something interesting happens when we try to solve the systems of equations in terms of 
the variables x, y, and z: 


Solve[{6x+y+4w == —3, —9 x+2 y+3 z—8 w == 1, 7x-z+5w == 
2}, {x, y, z} 


In this situation, since w = —13 is a fixed value, x, y, and z cannot be expressed in terms of 


w; nor can they be expressed in terms of each other making it impossible to solve this 
system for only the variables x, y, and z. 


Now, let us look at an example of a linear system where there is no 
solution and how the rref matrix will reflect this fact. 


Example 3.1.3. We want to solve the linear system 
6x + y + 4w = —3 
—9r + 2y + 3z — 8w = 1 
—3z + 3y + 3z — 4w = 10 


The augmented matrix of this system is 


610 4 —3 
C=|—9 23 -8 1 
-3 3 3 —4 10 


107 


CM = {{6, 1, 0, 4, —3}, {—9, 2, 3, —8, 1}, {—3, 3, 3, —4, 10}}; 
RowReduce[CM] // MatrixForm 


10-1 9 

o1 $ -4 0 

00 0 0 1 
Solve[{6x+y+4w == —3, -9x+2y4+3z-—8w == 1, -3x+3y+ 
3z—4w == 10}, {x, y, z} 


{} 


The last row of the rref matrix gives the equation 0 = 1 when converted back to an 
equation. Since this is clearly impossible, it indicates that this linear system has no 
solution and so is said to be inconsistent This feature of the rref matrix will always 
appear when there is no solution. Also notice that when we use the Solve command on 
the system, Mathematica returns no answer, thus the system has no solution. 


There are also linear systems where the rref matrix contains rows that are 
all zeros. This indicates that at least one of the equations of the system is 


not needed in order to solve the system. The following is an example of 
this situation. 


Example 3.1.4. We want to solve the linear system 


6r + y + 4w = —3 
—9r + 2y + 3z —8w = 1 
-3T + 3y + 3z — 4w = —2 


The augmented matrix of this system is 


I 


G6 1 § 4 - 
C= -9 23 -8 1 
-3 3 3 -4 -i 


108 


CM = {{6, 1, 0, 4, —3}, {—9, 2, 3, —8, 1}, {—3, 3, 3, —4, —2}}; 
RowReduce[CM] // MatrixForm 


1o -} # -i 

01 f -4 -1 

00 0 0 0 
Solve[{6x+y+4w == —3, -9x + 2y +3z— 8w == 1, —3x+ 3y + 
3z — 4w == —2}, {x, y} 


{ {x = x (-—7-16w+3z),y—> = (-744w- 6z)}} 


This linear system has a solution of dimension two in R The last row of all zeros in the 
rref matrix indicates that one of these three equations is superfluous to solving the 
system. Hence, one of the equations can be written as a linear combination of the 
remaining two, as this is the only way that our three operations can reduce an entire row 
of an augmented matrix to all zeros. A linear equation is called a linear combination of 
other linear equations if it is a sum of scalar multiples of the other equations. Here, the 
third equation is the sum of the other two equations, and so it is a linear combination of 
them. 


All of the previous examples have been systems with more variables than 
equations that typically have an infinite number of solutions since their 
solution will contain at least one arbitrary independent variable. A square 
system with as many variables as equations usually has only one solution. 


Example 3.1.5. Let us solve the following square 4 x 4 system given by 
—Tz + 2y — 9z + 3w = —5 
4t +y +6z-— llw = 15 
r—y+3z+8w=4 
92 + 12y — 7z + 5w = —2 


The system is said to be 4 x 4 since there are four equations and four unknowns. In 
general, if there are m equations and n variables, the resulting augmented matrix will be 


an element of Reo), The augmented matrix of this system is then 


109 


CM = {{-7, 2, —9, 3, —5}, {4, 1, 6, —11, 15}, {1, —1, 3, 8, 4}, {9, 
12, -7, 5, —2}}; 
(R = RowReduce[CM]) // MatrixForm 


100 0 -#3 


0100 ğu 
950 
0010 % 
23 
0001 -2% 


R{[All, —1]] 


g= 8511 950 23 } 
2056° 2056° 257° 2056 


This 4 x 4 square system has the unique solution 


5881 8511 950 23 ) 


{2 = — 2056” = 2056’* ~ 257’ 


w = — 3056 


Square systems typically have a single solution, which is why people 
prefer square systems whenever possible in solving a real-world problem. 


Definition 3.1.1. A determined, or square linear, system is a system in 
which the number of equations m equals the number of unknown variables 
n. 


Definition 3.1.2. An underdetermined linear system is a system in which 
the number of equations m is less than the number of unknown variables 


n. 


Definition 3.1.3. An overdetermined linear system is a system in which 
the number of equations m is greater than the number of unknown 
variables n. 


110 


Underdetermined systems typically have an infinite number of solutions, 
while overdetermined systems typically have no solution, as we shall see 
next. 


Example 3.1.6. As an example, let us try to solve the overdetermined linear system 


Tr + 2y+9z=8 
—3zr + 5y + 6z = —2 
llr -y +4z=3 
8r + 13y — 10z = -11 
The augmented matrix of this system is 
7 2 9 8 
-=B p 6 -2 
ll -1 4 3 
8 13 -10 -11 


CM = {{7, 2, 9, 8}, {-3, 5, 6, —2}, {11, —1, 4, 3}, {8, 13, —10, 
—11}}; 
RowReduce[CM] // MatrixForm 


1000 
010 0 
001 0 
eS a 
The last row of this rref matrix gives the equation 0 = 1, which is impossible. This 

implies that our overdetermined system has no solution. However, if we are to remove 


the last equation from the original system, we would end up with the following 
augmented matrix that is determined, that is, with three equations with three unknown 


variables: 
7 2 9 8 
Cl=j]| -3 5 6 -2 
11 -l 4 3 


111 


CM2 = {{7, 2, 9, 8}, {—3, 5, 6, —2}, {11, —1, 4, 3}}; 
RowReduce[CM2] // MatrixForm 


100 -# 

3 
010 -% 
001 2 


So, by removing one equation, we get the unique solution: 


ae ee, poe: 
ge mP I6 


If we remove yet one more equation, we end up with two equations and three unknowns, 
which implies a solution of dimension one: 


(CM3 = {{7, 2, 9, 8}, {—3, 5, 6, —2}}) // MatrixForm 
7 29 8 
-3 5 6 -2 

RowReduce[CM3] // MatrixForm 
019% R 


The solution to this system is clearly one-dimensional, given by 


„3,4 ,__ 69, 10 
a a a a 


Homework Problems 


1. Give both an algebraic and geometric explanation of why 
underdetermined systems typically have an infinite number of 
solutions while overdetermined systems typically have no solution. 


112 


2. Using your answer to problem 1, construct (if possible) 
underdetermined and overdetermined systems that have a single 
solution, that is, a solution of dimension zero. 

3. Perform Gauss—Jordan elimination on the following systems of 


equations, but with one restriction: You are not allowed to swap 
row 1 with any other. Be sure to show each step in the process. 


(a) — 22 + 6y + 3z = 4 (b) 22 + 3y —5z =7 
z—3y+z=6 32 + 2y+7z =8 
2z + 5y + 6z = 4r + ôy + 2z = 
(c) 2w +32 + 4y +2z =5 (d) 4w +6r +y- Tz=8 
r+y+z=1 wt+tt+y+z= 
2w+22+ 3y+2z2=4 2w + 22 + 2y—3z=5 
4w + 62 + 8y + 4z = 10 —wtr+y+z=6 


(e) — 2r + 4y — 8z - 3u+v=2 (f) — 2r + 4y — 8z = 
3z +4y—-Tz-u-5v=l 32 +4y-7z=1 
se+z2+ut+v=—l öz +z=-—1 
—32 + 4y4+3z=2 
4. Write out the solutions to each system from problem 3, and give 
the dimension of the solution and the space R” that it lies in. 


5. Write out the solutions to each system from problem 3 in column 
matrix format using scalar multiplication by the independent 
variables. 


6. Would Gauss-Jordan elimination be easier to implement in any 
of the systems of problem 3 if you were allowed to use the row 
swap operation? If so, explain and go through the process of 
Gauss—Jordan elimination again, using the row swap operation. 


7. Perform Gauss—Jordan elimination on the following matrices: 
2 
2-3 6 1 
2 -3 6 1 1 -1 1 4 
Jl E | » 8 -4 2 3 |. (c) E j 
8 4 2 7 325 z 5 9 T 24 


8. Given the following two systems of equations: 


113 


(a) 3r +4y=-—-1 (b) 34+4y=9 
4r —2y = 6 4z — 2y = —10 


Explain how the following matrix can be used to solve both systems 
simultaneously, then do so: 


3 4|-1 9 
4 -2 6 —10 
9. Is it possible for a rref matrix to have more than one row whose 


corresponding linear equation is 0 = 1? If no, explain why not. If 
yes, then give an example. 


10. A square matrix is called upper triangular if all of its entries 
below the main diagonal from upper left to lower right are 0. If A is 
a square matrix, then must rref (A) be upper triangular? If yes, 
explain why. If no, then give an example. 

11. What would have to be true about a linear system so that the 
rref matrix of the augmented matrix of this system has all rows of 
all zeros except for the first row? Give examples of the possibilities. 
12. What would have to be true about a linear system so that the 
rref matrix of the augmented matrix of this system is an identity 
matrix? Give examples of the possibilities. 

13. What would have to be true about a linear system so that the 
rref matrix of the augmented matrix of this system is an identity 
matrix to the left of the last column? Give examples of the 
possibilities. 


Mathematica Problems 


1. Solve homework problems 3 and 5 using the row swap and add a 
multiple of one row to another operations. 


2. Solve homework problems 3 and 5 using the RowReduce 
command. 


3. Perform Gauss—Jordan elimination using the standard three row 
operations on the following matrices: 


114 


4. For each rref matrix given below, find a corresponding original 
linear system of equations whose rref matrix yields the rref form. 


Your original system’s augmented matrix should not resemble the 


rref matrix that you are attempting to reduce to: 


as oceo 


vne o 
Ss Llo 
su no 
coo - & 
O oOo 


“oco 


v= oO 
o ao 
ieee a 
seo 


oOo « © 


1 
| o 
5. Let 


115 


0000 0 0 


Perform rref on the augmented matrix (A|/2) of A with the 2 x 2 
identity matrix after it. You will get a 2 x 4 matrix (/2|B), with the 2 
x 2 identity matrix followed by a 2 x 2 matrix B. Do you recognize 
B? Tf not, then find the two products AB and BA; do you now know 
what B is? Generalize this to larger square matrices A and see if it 
still works. 


6. Let 


ee) 2 D 
TPA 


The row operation of multiplying row 1 of A by k and then adding 
the result to row 2 of A gives 


a b 
= | ka+c S| 


If we perform this row operation on the 2 x 2 identity matrix /2, we 
get 


1 0 
B= 
| re l 
What is the product BA? Generalize this so that each of the three 


types of row operations applied to Æ can be done through 
multiplication by a matrix like B. 


7. Solve in as many ways as possible the linear systems that have 
the augmented matrices in Mathematica problem 4 of this section. 
Write your solutions using column matrices. 


3.2 Elementary Matrices 


In Section 3.1, we discovered that Gauss-Jordan elimination (rref) uses 
three different kinds of row operations to convert a matrix to the final 
augmented matrix, which in turn provides the solutions to the linear 
system. Each of these three row operations can be done through the use of 
matrix multiplication on the left by what are called elementary matrices. 


116 


The multiplication by the elementary matrix E on the left must preserve 
the size of the matrix A. 


Thus, if A is of size / x n, then for E of size m x k, we will have EA 
possible only if k = l, and of the same size as A if m = 1. Now we see an 
elementary matrix E must be square and of size / x / if it is to be applied, 
by left multiplication, to matrices A of size / x n, and be size preserving as 
well. 


Example 3.2.1. Let us do an example of the first row operation of swapping two rows 
and how it can be done using multiplication by an elementary matrix E. This type of 
elementary matrix is referred to as type I. First, we will define 


12 3 4 
A=|5 6 F 8 
9 10 ll b 


We now ask: What matrix E € R” will swap the first and third rows of A when you 
compute EA? So, if B = EA, then 


9 10 ll 12 
B=|5 6 7 8 
123 4 


The answer, after some thought, is that we need 


0 0 1 
E=;0 1 0 
ie 


which is the matrix obtained by swapping the first and third rows of the 3 x 3 identity 
matrix 


100 
Iz=|0 1 0 
001 


Multiplication of A by /3 on the left gives A back since 734 = A, so swapping /3’s rows 
should swap A’s rows; at least we hope that it will. Notice that EE = /3 (spend 2 minutes 
performing the multiplication to convince yourself; it is time well spent) since 
multiplication by E twice should bring us back to where we started as the second E swaps 
the rows back to their original positions. 


117 


(A = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10, 11, 12}}) // MatrixForm 


12 3 4 
5 6 7 8 
9 10 11 12 


(ET1 = {{0, 0, 1}, {0, 1, 0O}, {1, 0, 0}}) // MatrixForm 
o ü 1 

0 

0 

ET1.A // MatrixForm 


9 10 11 12 
5 6 7 8 
I 2 3 4 


ET1.ET1 // MatrixForm 


1 0 0 
0 1 0 
00 1 


Example 3.2.2. Now let us do an example of the second row operation of multiplying a 
row by a nonzero number c, and see how it can be done in a similar manner. We want to 
multiply the second row of the preceding matrix A above by —S. Then it seems that the 
matrix E we need should be 


1 0 0 
B=/]0 -5 0 
0 0 1 


which is an elementary matrix of type II. In other words, the pattern seems to be that to 
find the correct matrix E, we must perform the row operation on the 3 x 3 identity matrix 
13. The matrix that undoes this row operation (or undoes Æ, as it may be said) is 


118 


since FE = EF = 13. Then 


(ET2 = {{1, 0, 0}, {0, —5, 0}, {0, 0, 1}}) // MatrixForm 


1 0 0 
0 -5 0 
0 0 1 


ET2.A // MatrixForm 


1 2 3 4 
—25 —30 -35 —40 
9 10 11 12 


(FT2 = {{1, 0, 0}, {0, —1/5, 0}, {0, 0, 1}}) // MatrixForm 


1 0 0 
(a — 


= © 


ET2.FT2 // MatrixForm 


1 0 
0 1 
0 0 


= © o 


Example 3.2.3. Now on to the third and final row operation of multiplying a row by a 
nonzero number c and adding it to another row. Let us now multiply row 1 of the matrix 
A above by -3 and add it to row 2. This should put the result in row 2 and leave the other 
rows the same. A type III elementary matrix is when this row operation is applied to /3. 
For this instance, we have 


119 


1 0 0 
E=;-3 1 0 
Oo zE 


The row operation that undoes this one multiplies row 1 of the matrix A above by 3 and 
adds it to row 2 with matrix 


1 
F=]|3 
0 


or © 


0 
0 
1 


so that FE = EF = 13. 


(ET3 = {{1, 0, 0}, {—3, 1, O}, {0, 0, 1}}) // MatrixForm 


1 0 0 
-3 1 0 
0 01 


ET3.A // MatrixForm 


1l 2 3 4 
2 2 2 —4 
9 10 11 12 


(FT3 = {{1, 0, 0}, {3, 1, 0}, {0, 0, 1}}) // MatrixForm 


1 0 
3 0 
0 1 


ET3.FT3 // MatrixForm 


1 0 
0 0 
0 1 


omo 


O- & 


120 


In summary, we have introduced the important concept of elementary 
matrices, which has many theoretical uses in linear algebra. Gauss—Jordan 
elimination and rref can be done through left multiplication of the 
augmented matrix of the linear system by successive elementary matrices, 
one elementary matrix applied for each row operation done. Next, we give 
the formal definition of an elementary matrix, although you undoubtedly 
know what it’s going to be by now. 


Definition 3.2.1. An elementary matrix E of size k x k is simply the 
modified k x k identity matrix J, after exactly one elementary row 
operation has been applied to it. 


Example 3.2.4. Now, let us do an example where we use multiple elementary matrices to 
find the rref matrix of the following matrix A. Consider the augmented matrix 


1 2 -1 6 
A=|2 -1 3 7 
0 4 5 —2 


(3.4) 


We will perform Gauss-Jordan elimination using successive left multiplications by 
elementary matrices. Notice that we need to get rid of 42,1 = 2 first. To do this, we would 
take row 1, multiply it by —2, and add it to row 2. We would write it as 


W AE, | 
Eı=| -2 1 0 
0 0 1 


If we perform the multiplication E14, we get 


1 2 -l 6 
F,A=|}0 -5 5 —5 


6.5) 0 4 5 —2 


Now, we want to get rid of A 1,2 and 43,2 but before we do this, we will multiply row 2 


by -4 to make our life easier. So 
þ 


121 


1 0 0 
F,=|0 -} 0 
0 €4 
and we get 
12-1 6 
E,ER,A=|}0 1 -1 1 
a6 04 5 -2 


Now, we can zero out entry 41,2 by taking —2 times row 2 and adding it to row 1. 
Similarly, for 43,2 we take —4 times row 2 and add it to row 3. The elementary matrices 


are: 


1 -2 0 1 0 0 
Es = 0 1 O 9 E4 = 0 | R 1 
D aI 0 —4 1 
Performing the multiplication gives 
L we 1 4 
E,EzE256,A= | 0 1 -1 1 
0 0 9 -6 
(3.7) 
We are getting close to an answer now. Next step is to divide row 3 by 9. So 
1 0 0 
Es=|0 1 0 
1 
00 3 
With 


122 


1 0 1 4 
EsEsE3E2E;A=}0 1 —i 1 
2 
3 


(3.8) 0 0 l = 


Last, we wish to zero out 41,3 and 42,3 To do this, we add minus row 3 to row 1, and row 
3 to row 2. Thus, E6 and £7 are given, respectively, by: 


i Q i 1 0 0 
EFe=|01 0|, &=|0 1 ı 
00 1 001 


Finally, after multiplication by 7 elementary matrices, we have 


1 0 0 LI 
E;EgE;E,E3E2E,A=| 0 1 0 ; 
0 0 1 -2 


(3.9) 


There is one thing to notice and consider. If you want to speed up the process a bit, 
consider the last two matrices, £6 and E7. We could combine these two together in a new 
matrix 


Eei 
En =ErvEs=|01 1 
00 1 


such that 


(3.10) E76E5E4E3E2E1A = E7E6 E5 E4 E3 E2 EA 


We will go through the same steps but with Mathematica this time: 


(A = {{1, 2, —1, 6}, {2, —1, 3, 7}, {0, 4, 5, —2}}) // MatrixForm 


123 


(E1 = {{1, 0, 0}, {—2, 1, 0}, {0, 0, 1}}) // MatrixForm 


1 0 0 
seid i, 
0 0 1 


E1.A // MatrixForm 


(E2 = {{1, 0, 0}, {0, —1/5, 0}, {0, 0, 1}}) // MatrixForm 


1 0 0 
0—4 0 
0 0 1 


E2.E1.A // MatrixForm 


E3.E2.E1.A // MatrixForm 
tU 4 4 
0 1 -1 1 
045 -2 


124 


(E4 = {{1, 0, 0}, {0, 1, 0}, {0, —4, 1}}) // MatrixForm 
1 0 0 
0 1 0 
0 —4 1 
E4.E3.E2.E1.A // MatrixForm 
1a 4 4 
0 1 -1 1 
00 9 —6 
(E5 = {{1, 0, 0}, {0, 1, 0}, {0, 0, 1/9}}) // MatrixForm 
1 0 0 
0 1 0 
0 0 A 


E5.E4.E3.E2.E1.A // MatrixForm 


(E6 = {{1, 0, —1}, {0, 1, 0}, {0, 0, 1}}) // MatrixForm 


1 0 =-l 
01 0 
00 1 


E6.E5.E4.E3.E2.E1.A // MatrixForm 
100 # 
01-1 1 
ae 


125 


(E7 = {{1, 0, 0}, {0, 1, 1}, {0, 0, 1}}) // MatrixForm 


E7.E6.E5.E4.E3.E2.E1.A // MatrixForm 


100 # 
010 į 

2 
001 -3 


Next, we will have Mathematica perform the RowReduce command to verify that our 


application of elementary matrices was correct. 


RowReduce[A] // MatrixForm 


100 # 
1 
010 3 
2 
001 -2 


Finally, in regard to the comment about replacing £7 and E6 with E76; we find that if we 
replace the matrix product £7E¢ with the single matrix E76, we will get the same result: 


(E76 = {{1, 0, —1}, {0, 1, 1}, {0, 0, 1}}) // MatrixForm 


10 -1 
of A 
00 1 


-E3.E2.E1.A // MatrixForm 


E76.E5.E4 
1 
100 # 
1 
010 4 
2 
001 -2 


126 


Homework Problems 


1. Given a system of n equations in n variables, what would the 
maximum number of left multiplications by elementary matrices be 
to convert the original augmented matrix representation of the 
system to rref form? 

2. The example used in this section was a_three-equation, 
three-variable system. Can elementary matrices be used on 
nonsquare systems? What are the restrictions? 


3. Use left multiplication by elementary matrices to reduce the 
following systems of equations as far as possible. Also determine 
whether the resulting matrix is in reduced rref form: 


(a) -2r +5y=1 (b) -7r + 2y =5 (c) — 2z + 6y + 3z = 4 


22 — 3y =7 6z + 3y = 4 r—3y+z=6 
2r + 5y+6z=1 


(d) 22+3y—-5z=7 (e) —2r+3z=4 (f)w+2r—y+3z=4 
3r + 2y + 7z=8 —3y+2z=6 —w + 27 + 4z = 
4z + 6y + 2z=1 2z + Sy = 1 w—x2—3y+z=6 


2—2y+5z=1 


4. Use left multiplication by elementary matrices to reduce the 
following systems of equations to rref form (or as close as 
possible). You may leave your answer in rref matrix form. 


(a) —22+6y+32=4 (b) —2a-—5y=-2 (ec) w+r4+z=1 


z—3y+z= 22+ 3y=7 zr+3z=4 
3r+2y=8 6y + 2z = 
(d) 2r + 3y =7 
3x + 2y =8 
zr-y=1 


5. Use the elementary matrices from part (a) of problem 3, on the 
following two corresponding systems and explain what this implies. 


(a) -2r +5y=—4 (b) —2r+5y= -1 
2r — 3y = 3 22 — 3y = —2 


127 


6. (a) Give examples, £1,£2, and E3, of each of the three types of 4 
x 4 elementary matrices. 
(b) Next, find for each of the elementary matrices £1, E2, and 
E3 of part (a) another elementary matrix F1, F2, and F3 of the 
same type and size as the corresponding E so that Fk Ex=/4and 
+ a 1 . . 
ExF\= l4 for x = 1,2,3. Each Fk = Ek that is, each Fk is Ex’s 
multiplicative inverse and vice versa. 
(c) Compute rref(Ex\/4), for k = 1,2,3 where (Ex| 74) is the 4 x 8 
augmented matrix. What is the result of these three rref’s? 
7. (a) Give examples E1, E2, E3, of each of the three types of 4 x 4 
elementary matrices. 
2 2 2 3 3 3 
Ey, Eż, E3 and E ’ E}, E3 Now give a 
ET , EX i E m 


3 where m is any positive 


(b) Next, compute 
general formula for 
integer. 
(c) If you did problem 6, then find a general formula for 
Fir m En En ` . z 
1 > “2 s =3 , where m is any negative integer. 
(d) Can you put parts (b) and (c) of this problem together to get 
Er Em Em 5 F 
a general formula for “1 * =2 » ““3 where m is any integer? 
8. Let E and F be two 3 x 3 elementary matrices. What must be true 
about E and F so that EF = FE, that is, so that E and F commute? 
9. (a) Which type of elementary matrix is always a diagonal matrix? 
A diagonal matrix is a square matrix A, where all entries Aj; = 0, 
when i #/. 
(b) Which type of elementary matrix is always an upper or 
lower triangular matrix? An upper or lower triangular matrix is 
a square matrix A where, in the upper triangular case, all entries 
Ai j = 0 when i > j while in the lower triangular case all entries 
Ajj =0 when i<j. 


Mathematica Problems 


1. Verify the matrix multiplications used in homework problem 3. 
2. Verify the matrix multiplications used in homework problem 4. 


3. Use the RowReduce command to row reduce each of the 
matrices created from the systems in homework problem 4. 


128 


Compare the reduced-row matrices to your final answers from the 


original problem. 


4. Use left multiplication by elementary matrices to solve the 
following systems of equations given in their augmented matrix 


form: 
2 -5 
(a) 3 2 
=i ol 
2 -3 
8 -4 
FET Fd 
-3 6 


—3 


aoc © 


8 
3 
—4 
-6 


2 
=] 
-4 


5. Use Mathematica to do homework problem 6 for any-size 
elementary matrix. 


6. Use Mathematica to do homework problem 7 for any-size 
elementary matrix. 


7. Use Mathematica to do homework problem 8 for two elementary 
matrices, £ and F, of the same size. 


3.3 Sensitivity of Solutions to 
Error in the Linear System 


We have spent a reasonable amount of time and effort attempting to solve 
linear systems of equations where we assume that the values in the system 
are exact, thus without any error. One might expect that if the left-hand 
portion of an augmented matrix B is very close to another augmented 
matrix A, both of which have the same last column, then both systems will 
have very similar solutions. 


Example 3.3.1. As an example, consider the following two systems. The first we will call 
the exact system, which is given by 


129 


62+ 9y =5 
—2z2+7y=3 

We will denote the second system as the approximate system, and it is given as follows: 
6.12 +9y=5 
-2r + Ty =3 

In matrix form, these two systems are given by 


6 9 5 61 9 5 
—2 7 3]’ | -2 7 3 


respectively. We can solve both of these systems of equations in Mathematica using the 
Solve command (we will skip the matrix solution representation for now): 


N[Solve[{6x + 9y == 5, —2x + 7y == 3}, {x, y}]] 
{{x — 0.133333, y — 0.466667}} 


N[Solve[{6.1x + 9y == 5, —2x + 7y == 3}, {x, y}]] 
{{x — 0.131796, y — 0.466227}} 


It is apparent that the two solutions are very close. However, this is not always the case, 
so we will begin investigating how a small error in the variable coefficients or the RHS 
values of the equations can affect the solutions to a linear system. 


Example 3.3.2. We will now solve the two almost identical linear systems with the exact 
system as 


1872 + 790y = 5 
(3.11) 201r + 850y = 87 


and the approximate system as 


187z + 790y = 5 


201.1x + 850y =: 
A 201.12 + 850y = 37 


130 


Once again, we will use Mathematica to solve each system; however, in this example, we 
will work strictly with the matrices themselves and not the equations that they were 
derived from: 


(Ae = {{187, 790, 5}, {201, 850, 37}}) // MatrixForm 
187 790 5 
201 850 37 
N[{RowReduce[Ae]] // MatrixForm 
1. 0. —156.125 
0. 1. 36.9625 
(Aa = {{187, 790, 5}, {201.1, 850, 37}}) // MatrixForm 
187 790 5 
201.1 850 37 
N[{RowReduce[Aa]] // MatrixForm 


1. 0. —308.395 
0. 1. 73.0062 


The solutions to the exact system (3.11) and approximate system (3.12), respectively, are 
{a = —156.1250197, y = 36.96250466} , {2 = ~308.3951262, y = 73.00618810} 


The solution to the approximate system has values almost double those of the solution to 
the exact system. This is clearly interesting, and the behavior is completely different from 
that in the first example. Since in real life there is almost always error in a linear system 
because its values are derived from measurements with inherent error, we must know 
whether it is possible to avoid this sensitivity to error and also what causes it. 


Since we used a two-dimensional system, it can be visualized easily in the plane. So let 
us solve for y in each equation of the exact system (3.11): 


Expand[N[Solve[187 x + 790 y == 5, y]]] 


{{y — 0.00632911—0.236709 x}} 


131 


Expand[N[Solve[201 x + 850 y == 37, y]]] 


{{y — 0.0435294—0.236471 x}} 


Note that these lines are very close to being parallel, since they have nearly identical 
slopes. This may be the cause of our sensitivity to slight amounts of error since a slight 
change in the slope of one line can move their intersection point a great distance from the 
original intersection point. If you find this hard to visualize, try the exercise with a pair of 
chopsticks. 


Example 3.3.3. Now let us measure and plot the error in our intersection point for two 
almost parallel lines as a function of the change in slope of the second line. The error in 
our linear system’s solution will be the distance in the xy-plane between the exact 
solution and the solution with an error in its slope from the first line 


Our system of linear, almost parallel, equations will be defined as 


0.236712 + y = 0.00633 


G13) (0-23647 + R)z + y = 0.04353 


where the error in the slope is R. The exact system is the one for R = 0, and they become 
parallel lines when R = 0.00024 with no solution to the system. When R = 0, the solution 
to the system is as given below: 


Ae = {{0.23671, 1, 0.00633}, {0.23647, 1, 0.04353}}; 
RAe = RowReduce[Ae]; 
ExactSoln = RAe{[All, —1]] 


{—155., 36.6964} 


The exact solution’s x coordinate is given by ExactSo/n[[1]] and its y coordinate, by 
Exactsoln{[2]]. For R + 0, if we denote xa and ya as solutions to the approximate system, 
then to compute the distance from the exact solution, we use the distance formula given 
by 


(3.14) 


D = y (xa — ExactSoln{[1}])” + (ya — ExactSoln{(2}})? 


Instead of computing this by hand, we will have Mathematica do it for us and then graph 
the results: 


132 


As = {{0.23671, 1, 0.00633}, {0.23647 + R, 1, 0.04353}}; 
ApproxSoln = RowReduce[As]|[[All, —1]] 


0.0372064 0.0267416R 
erie As p et 
{0 or 0.00024— L.R ` 0.00024 — L.R 
0.010304 — 0.00633 (0.23647 + R) ) 
0.00024 LR 


ErrorFunc = Sqrt{(ExactSoln{{1]] — ApproxSoln|[1]])? + (ExactSoln 
[[2]] — ApproxSoln{[2]])?]; 

Plot(ErrorFunc, {R, —1, 1}, PlotRange-+{{—0.3, 0.3}, {120, 180}}, 
PlotStyle+{{Red, Thickness[0.01]}}, AxesOrigin— {—0.3, 120} 


Figure 3.1: Plot of error for —0.3 < R < 0.03. 


Error 
180 


120 : R 
-0.3 -0.1 0.1 0.3 


In Figure 3.1, the plot of the error (or distance) between the exact solution and the 
approximate system solutions has a vertical asymptote at R = 0.00024, which is the value 
of R that makes the lines parallel, and so system (3.13) is without a solution. The closer R 
is to this value, the larger the error. Note that this error is not a linear but an inverse 
relation. As R moves away from 0.00024, the error stabilizes at a value close to 160. 


Let us also plot these solutions as a parametric plot in the xy-plane to see their behavior 
more clearly. The solutions form a line of points in our plot. Notice how quickly the dot 
moves at the beginning (R = 0) versus the end of the animation near R = 0.00024. Figure 
3.2 depicts one frame in the animation sequence, namely, when R = 0.000187. 


133 


SolutionPlot[r_] := Block[ 

{originalsolution, currentsolution, rlsolution, distance, 
pointgraphics, errorplot, pointplot, x, y, r1}, 

rlsolution = {x, y} /. Solve[ {23671/100000 x + y == 633/100000, 
(23647/100000 + r1) x + y == 4353/100000}, {x, y}][[1]]; 

originalsolution = {x, y} /. Solve[ {23671/100000 x + y == 633/ 
100000, (23647/100000) x + y == 4353/100000}, {x, y}]|[1]]; 

currentsolution = rlsolution /. rl—r; 

distance = Norm|rlsolution — originalsolution]; 

pointgraphics = Graphics[{Arrowheads|.1], PointSize[.04], Red, 
Point[originalsolution], Blue, Point/currentsolution], Green, 
Arrow[{originalsolution, currentsolution }]}]; 

pointplot = Show[pointgraphics, PlotRange-+{{—1000, —150}, 
{0, 250}}, Axes— True, AxesLabel-+{x, y}, AspectRatio+2/3, 
ImageSize—+250]; 

Return[{ Show[GraphicsArray[ {{Show[pointplot, AxesOrigin— 

{—1000, 0}]}})]]] 


Manipulate[SolutionPlot[r], {{r, 0}, 0, .0002, .000001}, SaveDefini- 
tions True] 


Figure 3.2: Animation of the parameterization of solutions in 
terms of R 


134 


Example 3.3.4. In our next example, we look at a system of three equations in three 
unknowns. The exact system is 


5r — 2y + 8z = 15 
—3z + 9y + ilz = —4 
(3.15) 2x + Ty + 18.7z = 20 


while the approximate system is 


5r — 2y + 8z = 15 
—3x + Sy + 112z = —4 


6.16 2x + Ty + 18.9z = 20 


As before, we will add a variable R into system (3.15) in order to look at error: 


135 


5x — 2y + 8z = 15 
—3r + 9y + 1llz = —4 
2x + Ty + (18.7 + R)z = 20 


I] 


(3.17) 


When R = 0.3, this system has no solution since, for this value of R, the last equation is 
the sum of the first two equations with a different RHS value. We will use the 
three-dimensional distance formula, similar to the two-dimensional one given in (3.14), 
and perform computations that mirror those done in the two-dimensional system of the 
previous example: 


Ae = {{5., —2, 8, 15}, {—3, 9, 11, —4}, {2, 7, 18.7, 20}}; 
RAe = RowReduce[Ae]; 
ExactSoln = RAe[[All, —1]] 


{75.5641, 61.4103, —30.} 
Aa = {{5., —2, 8, 15}, {—3, 9, 11, —4}, {2, 7, 18.9, 20}}; 


RAa = RowReduce[Aa]; 
ApproxSoln = RAa|[[AIl, —1]] 


{220.179, 182.949, —90.} 


(As = {{5., —2, 8, 15}, {—3, 9, 11, —4}, {2, 7, 18.7 + R, 20}}) // 
MatrixForm 


5 -2 8 15 
=3 9 11 —4 
2 7 187+R 20 


ApproxSoln = RowReduce[As][[All, —1]] 


0.641026- 


; 4230. 3555. 1755. 
{3.25641 —58.5 + 195. R° —58.5 + 195.R’ —58.5 + 195. x} 


ErrorFunc = Sqrt[(ExactSoln [[1]] — ApproxSoln[[1]])? + (ExactSoln 
[[2]] — ApproxSoln{[2}])? + (ExactSoln[[3]] — ApproxSoln([3]])?}; 


Plot(ErrorFunc, {R, —1, 1}, PlotRange-+{{—1, 1}, {0, 1000}}, Plot- 
Style+{{Red, Thickness[0.01]}}, AxesOrigin-+{0, 0}, AspectRatio 
2/3] 


136 


Figure 3.3: Plot of error for -1 <R < 1. 


Error 


1000 p 


600 


200 


' i R 
-} -0.5 0 0.5 l 


It is clear from the plot of the error shown in Figure 3.3 that we have a vertical asymptote 


at R = 0.3. Let us also plot these solutions as a parametric plot in the xyz-space R: to see 
their behavior more clearly. The solutions form a line of points in our plot, similar to the 
previous two-dimensional problem. Notice the change in speed of the dot as the 
parameter R tends to the asymptotic value of R = 0.3. Figure 3.4 depicts one frame in this 
animation. 


SolutionPlot3D[r_] := Block{ 

{originalsolution, currentsolution, rlsolution, distance, 
pointgraphics, errorplot, pointplot, x, y, z, r1}, 

rlsolution = {x, y, z} /. Solve[ {5x — 2 y + 8 z == 15, -3x +9 y 


+ 11 z == —4, 2 x + 7 y + (18.7 + r1) z == 20}, {x, y, z})[[1]]; 
originalsolution = {x, y, z} /. Solve[{5 x — 2 y + 8 z == 15, -3x 
+9 y +11 z == —4, 2x + 7 y + 18.7 z == 20}, {x, y, z}][[1]]; 


currentsolution = rlsolution /. rl-r; 

distance = Norm[r1solution — originalsolution]; 

pointgraphics = Graphics3D[{ Arrowheads[.1], PointSize[.05], Thick- 
ness(0.01], Red,Point[originalsolution], Blue,Point[currentsolution], 
Green, Arrow[{originalsolution, currentsolution}}]}]; 

pointplot = Show([pointgraphics, PlotRange-+{{0, 500}, {0, 500}, 
{—200, 0}}, Axes—+True, AspectRatio +1, ImageSize—+{200, 200}, 
ViewPoint-—+{12, —10, 8}]; 

Return[Show|GraphicsArray|[{{Show[{pointplot] } }}]]] 


137 


Manipulate[SolutionPlot3D[r], {{r, 0}, 0, 0.25, .001}, SaveDefini- 
tions— True] 


Figure 3.4: Parameterization of solutions in terms of R Above, 
R=0.24 
+ 
+ 


r 


i. 


Our analysis above indicates that sensitivity to error in the variable 
coefficients occurs when at least one of the equations in the linear system 
can be “almost” formed from the other equations through a series of row 
operations made on the equations ignoring the RHS values of the 
equations. In a two-dimensional problem, this corresponds to having two 
lines that have nearly the same slope. In the three-dimensional setting, this 
corresponds to two planes having very similar planar slopes. However, in 
the three-dimensional setting, it is also possible that all three equations 
might have similar planar slopes. 


Example 3.3.5. We now consider the following system: 


138 


32+ 2y—4z=2 
—6x — 4y + 8.1z = —1 


(3.18) 32 + 1.95y —4z = 3 


Solving this system gives 


Ae = {{3, 2, —4, 2}, {—6,—4, 8.1, —1}, {3, 1.95, —4, 3}}; 
RAe = RowReduce[Ae]; 
ExactSoln = RAe[[All, —1]] 


{54., —20., 30.} 


Notice that the LHS of the second and third equations in system (3.18) are very close to 
multiples of the first equation. So we will rewrite the system of equations to reflect this: 


32+ 2y—4z=2 
32+ 2y—(44+ R)z = = 
(3.19) 32+ (2+ S)y—4z=3 


The exact system is found when S = —0.05 and R = 0.05. We will now determine the 
distance from the exact solution in a slightly different manner, but in a way that is 
equivalent to what we have done in the previous two examples. First, we will solve the 
above system of equations for arbitrary values of R and S. To do this, we will use the 
Solve command: 


(As = {{3, 2, —4, 2}, {3, 2, —(4 + R), 1/2}, {3, (2 + S), —4, 3}}) 
// MatrixForm 


3 2 -4 2 
3 2 -4-R į 
3 2+S -4 3 


Notice that if R = 0 or S=0, there will be no solution since the denominator in the 
solution for at least one variable would be zero. This situation corresponds to at least two 
of the equations in system (3.19) having the same LHS, but different RHS. 


139 


RAs = RowReduce[As]; 
ApproxSoln = RowReduce[As][[All, —1]] 


(25E 1 =} 

3°R 38'S’ 2R 

ApproxSoln /. {R-+0.05, S——0.05} 
{54., —20., 30.} 


The distance (hence the error) from the exact solution (xe,ye,Ze) to the approximate 
solution (Xq,Va,Za) is given by 


d= (Te — Ta)? + (ve — Ya)” + (ze - Za)” 
(3.20) 


The approximate solution depends on two variables, S and R, so instead of getting a 
one-dimensional curve, we get a two-dimensional surface, which can be seen in Figure 
2:5. 


ErrorFunc = Sqrt{(ExactSoln{[1]] — ApproxSoln{[1]])? + (ExactSoln 
[[2]] — ApproxSoln{[2]])? + (ExactSoln[[3]] — ApproxSoln([3}])?|; 


Plot3D[ErrorFunc, {R, —0.3, 0.3}, {S, —0.3, 0.3}, Mesh—None, 
AspectRatio—2/3, ClippingStyle— None] 


Figure 3.5: Graph of the error for —0.3 < R, S < 0.3. 


140 


The graph in Figure 3.5 can be rotated in many directions to get a better idea of the 
behavior of the error as a function of R and S. In the contour plot depicted in Figure 3.6, 
integer contour values from 1 to 10 are given. Error values that range from 0 to 1 fall in 
the darkest region, then values from 1 to 2 in the next ring, and so on, until the lightest 
value corresponds to an error greater than 10. Note that one can specify the desired 
contours and the corresponding colors as well. 


ContourPlot[ErrorFunc, {R, 0.04, 0.06}, {S, —0.06, —0.04}, Con- 
tours—+{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}] 


Figure 3.6: Graph of the error for 0.04 < R < 0.06 and -0.06 < S$ < 
—0.04. 


0.045 0.055 


The contour plot shows how quickly the error builds up the farther the values of S and R 
are away from the exact system values given by S = —0.05 and R = 0.05. 


The moral of our story on sensitivity to error in a linear system is that it 
occurs mainly when at least one equation of the system is very close to 
being a linear combination of the other equations.of the the system. 
Geometrically, you should feel that sensitivity to error happens when at 
least one of the equations’ plots is almost parallel to the plot of some 
linear combination of the other equations. 


Homework Problems 


Consider the following systems of equations: 


141 


(a) — x +0.048y = 6 (b) — z + 0.049y = 6 
2x — 0.ly = 24 2z — 0.ly = 24 


(c) — r + 0.048y = 6.1 (d) — x + 0.048y = 6 
2z — 0.ly = 24 2.012 — 0.ly = 24 


We will denote system (a) as the exact system, while (b), (c), and (d) will 
be approximate systems. Answer the following questions: 
1. Solve all four systems for the variables x and y. 
2. Compute the Euclidean distance between the solution to the 
actual system and each of the three approximate systems. 
3. The Frobenius norm of an m x n matrix A (with potentially 
complex entries) is one way to measure the magnitude, or length, of 
a matrix. It is given by the following formula: 


yD lAl 


t=1 j=1 


|| A |le= 


The Frobenius distance between two matrices 4 and B of the same 


size is || A-B |lr . Use this definition of the Frobenius norm to 
compute the distance between the augmented matrix corresponding 
to the exact system and the augmented matrices corresponding to 
approximate systems. 

4. One might expect that the corresponding Frobenius norms from 
problem 3 should be arranged in the same order as the distances 
found in problem 2. Can you come up with a reason as to why this 
is not the case? 


Consider the following systems of equations: 


142 


(a) 6w—32+2y=7 (b) 6w—32r+2y=7 
w—2r+4y=5 w— 2r +4y=5 
4w+a2—Ty =0 åw +z -—6.ly=0 


lI 


(c) 6w — 3r + 2y =7 (d) 6w -—3r+2y=7 
w— 2r + 4y=5 w— 2r +4y=5 
5w +r—Ty=0 áw +z — Ty = —2 
As before, we will denote system (a) as the exact system, while (b), 


(c), and (d) will be approximate systems. Answer the following 
questions: 


5. Solve all four systems for the variables w,x and y. 


6. Compute the Euclidean distance between the solution to the exact 
system and each of the three approximate systems. 


7. Compute the Frobenius norm of the distance between the 
augmented matrix corresponding to the exact system and the 
augmented matrices corresponding to approximate systems. 


Mathematica Problems 


Consider the following generalized two equation systems: 


(a) ax + by =c (b) ax + by =c 
1 

maz +(mb+e)y=d maz + (ms + =) y=d 

(c) ar +by=e (d) az + by =c 
i 

max + m(b+ e)y =d max +m (è + 3 y=d 


1. As an example, let a = 5, b = —6, c = 1, and d = —1. Explore how 
the solutions to each of these systems vary depending on the values 
of m and e. Included in your exploration should be graphs of the 
difference between solutions to these systems dependent on the 


143 


parameters m and €. Graphical depictions of the Probenius norm 
defined in homework problem 3 should also be included. 

2. Repeat the process from problem 1, but this time, pick a couple 
of different values for d, some close to c, and some further away 
from c. How does this affect your graphs from problem 1? 

For the remaining questions for this section, we will focus on the 
following system of equations: 


32 — 5y + 6z = —10 


-2% + 4y -7z =1 
28 29 À 
5(1 + R)z — 3 (1 — RS)y + z (F + §)z = —3 


where the exact system corresponds to R = 1 and S= 0. 

3. Solve the system of equations for the variables x, y, and z. 

4. Determine all values of R and S that cause this system to have no 
solution. 

5. Construct a three-dimensional graph of the Euclidean distance 
between the actual solution (R = 1 and S = 0) and approximate 
solutions in terms of R and S Describe your resulting graph and how 
it corresponds to your answers to problems 3 and 4. 

6. Construct a contour plot of the Euclidean distance between the 
actual solution (R = 1 and S = 0) and approximate solutions in terms 
of R and S. Describe your resulting graph and how it corresponds to 
your answers to problems 3 and 4. 


144 


Chapter 4 


Applications of Linear 
Systems and Matrices 


4.1 Applications of Linear 
Systems to Geometry 


We begin this chapter with an application of linear systems to geometry. 
In this section, we will find the equations of conic sections that have a 
specified set of points on their graphs. Conic sections are ellipses, 
hyperbolas, and parabolas, which are all generated by intersecting planes 
with right circular cones. Their general equations are all of the form 


(4.1) AZ? + Bay + Cy’ + De + Ey+ F =0 


for constants A through F with A, B, and C not all being zero at the same 
time. The quantity B? — 44C is called the discriminant of the conic section 
since the conic section is an ellipse when B?—-44C<0,a hyperbola when 
B? —4AC > 0, and a parabola if B? — 44C = 0. 


The equation of a conic section (4.1) is said to be in standard form if B = 
0, and the conic section itself is then said to be in standard position since it 
will have its axes parallel to the xy-coordinate axes. A standard form conic 
section is then an ellipse if AC > 0 (where you get a circle when A = C), a 
parabola if AC = 0, and a hyperbola if AC < 0. As well, the equation of a 
circle is always in standard form with B = 0 and A = C. There are also 
degenerate cases to equation (4.1). Consider the following examples: 


145 


(4.2) 
z? +y?+1=0, (2-3)? + (y—7)? =0, (22 —3y + 1)(rx+y—6) =0 


The first of these has no solution, the second is the single point (3, 7), 
while the third is the product of two lines in R? but is also in the form of 
(4.1) when expanded. 


We will start with the most familiar of the conic sections, the circle. To 
arrive at the equation of a circle, we simply recognize that the values of A 
and C in (4.1) must be the same, and can thus divide through the entire 
equation by that quantity. This gives us the following equation: 


(4.3) e+y?+Dr+Ey+F=0 


We should remember that every circle, with center at the point (H,K) and 
radius R, can also be expressed by the equation 


(c — HY +(y—K)? =R 


Since both of these equations for a circle contain three unknown constants 
D, E, and F, or H, K, and R, it should take three noncollinear points in the 
xy-plane to determine a unique circle passing through them. Notice that 
equation (4.3) can also be expressed as 


Da + Ey + F = —(x2?+y?) 


which clearly indicates that the general equation of a circle is a linear 
equation in the three unknowns: D, E, and F. Since each of the three given 
points must satisfy the equation of the circle, we end up with the following 
linear system of equations in terms of the unknown variables D, E, and F: 


Dr, + Ey, + F = — (x? + y?) 
Daz + Ey. + F = — (x3 + y3) 


aa Dts + Bys + F =- (x3 + y3) 


It is very easy to forget that the system showed above has only three 
unknowns, D, E, and F; just remember that the x; and y; values are the 
coordinates of the three points that our circle must pass through. 


146 


Example 4.1.1. By using system (4.4), we will find the circle passing through the three 
points P(—7, 9), O(2, —13), and 7(8, 5), depicted in Figure 4.1. In addition, we will locate 
the coordinates (H, K) of the circle’s center, and its radius R. 


P = {-7, 9}; Q = {2, —13}; T = {8, 5}; 
PtsPlot = Graphics[{PointSize[0.05], Blue, Point[{P, Q, T}]}, Axes— 
True, PlotRange—{{—20, 12}, {—20, 15}}]; 


TxtPlot = Graphics[{Black, Text["P",{—8, 10}], Text["Q",{2,—11}}, 
Text["T",{9, 6}]}]; 


EqnCircle = d x + e y + f == —(x?+ y?) 

f+dx+ey == —x?-y? 

Eqnl = EqnCircle /. {x—P[[1]], y—P[[2]]} 
—7d+9e+f == —130 

Eqn2 = EqnCircle /. {x—+Q([1]], y+Q([[2]}} 
2d—13e+f == —173 

Eqn3 = EqnCircle /. {x-+T{[1]], y-+T|[2]]} 
8d+5e+f == —89 


Soln = Solve[{Eqn1, Eqn2, Eqn3}, {d, e, f}] 


(ame m- 


CircleEqn = (EqnCircle /. Soln)|[1]] 


6638 179x | 169y oer eS: a 2 
a: a - a y 


CircPlot = ContourPlot{Evaluate[{CircleEqn}], {x, —20, 12}, {y, 
—20, 15}, ContourStyle—+{Red, Thickness[0.01]}}, PlotRange— All, 
Axes— True, Frame—False, AspectRatio—1]; 


Show[CircPlot, PtsPlot, TxtPlot] 


Figure 4.1: The unique circle passing through P, Q, and T. 


147 


N 
5 f E 
- + — - x 
-10 -5 5 1D 
-$ b 


To put the circle given by CircleEgn into standard form, we will use the following 
function, CompleteSquare, which takes the coefficients of the terms found in the form of 
expressions like CircleEqn, and converts the equation to standard form. The function 
CompleteSquare was taken from Wolfram ’s MathWorld website [1]. On the MathWorld 
website, you can find many other Mathematica functions and notebooks that may help 
you throughout this text and other mathematical subjects: 


CompleteSquare[f_, z_] := Module[{a, b, c}, 

{c, b, a} = CoefficientList[f, x]; 

a (x + b/2/a)? + Simplify[(¢c — b?/4/a)]] 

StdForm = CompleteSquare[CompleteSquare[—CircleEqn{[2]] + Cir- 
cleEqn{[1]], x], y] == 0 


ES 5 (TF elf ng) me 
4802 98 gg 2 a 


Using the CompleteSquare function, we have found that the center of the circle is 


(- 179 _ 169 ) | eeoa 
located at coordinates 98 ' 98 / with radius 4802 ; 


Mathematica had no problem solving this linear system for D, E, and F to get the 
equation of this circle. Now let us see if Mathematica can solve for H, K, and R if we use 
the standard form for the equation of a circle: 


(z — H} + (y- K’ = R? 


Notice that this form directly involves the center, but is also not linear in terms of if, K, 
and R (all three variables have highest degree two). 


148 


EqnCircleTwo = (x — H)? + (y — K)?== R?; 
Eqn4 = EqnCircleTwo /. {x—+P[[1]], y-+P[[2]]} 


(—7—H)?+(9—K)? == R? 

Eqn5 = EqnCircleTwo /. {x—>Q[[1]], y+Q[[2]]} 
(2—H)?+(—13—-K)? == R? 

Eqn6 = EqnCircleTwo /. {x—+T{[1]], y>T[[2]]} 
(8—H)?+(5—K)? == R? 


SolnTwo = Solve[{Eqn4, Eqn5, Eqn6}, {H, K, R} 


(e+ Si a e, 


EqnCircleTwo /. N[%[[2]]] 
(1.82653+x)? +(1.72449+-y)*? == 141.779 
N[StdForm] 


~141.779+(1.82653+x)?+(1.72449+y)? == 0 


Now we see that both of these methods find the same equation for a circle. The Solve 
command undoubtedly used Gauss-Jordan elimination to solve the first system of 
equations since it was linear in the unknowns D, E, and F. For the second system, the 
Solve command probably used the system of equations version of the Newton-Raphson 
method since the system of equations is not linear in the variables H, K, and R. For more 
information on the Newton-Raphson algorithm for finding approximate solutions to 
nonlinear systems of equations, visit the book’s companion website. 


149 


Now we turn our attention to a somewhat different problem involving 
three points. A plane in space is uniquely determined by passing through 
three noncollinear points. On the other hand, the general equation of a 
plane in xyz-space is 


(4.5) Az + By+Cz=D 


for constants A through D, which is a linear equation in these four 
unknowns. Clearly, our three points will generate three equations only in 
these four unknowns, which indicates that we will get an infinite number 
of solutions to this linear system since the solution will have dimension 
one. We know that the plane is unique, so we should be able to reduce the 
problem to three equations in the three unknowns in order to obtain a 
single solution. 


In order to resolve our dilemma, we must realize that the constant D, on 
the RHS of the planar equation, is 0 exactly when the plane goes through 
the origin (0,0,0), and is otherwise nonzero. If we plot our three points and 
it is clear that the plane through them does not go through the origin (this 
is most likely to be the case), then D # 0 and we can divide equation (4.5) 
by D, yielding 


Ex+Fy+Gz=1 


for constants E, F, and G. Now we are in business, and can solve this 
square linear system of three equations in the three unknowns Æ, F, and G, 
where each equation is determined by going through one of the three 
given points. If the system of equations has no solution, then the plane 
passes through the origin and we must instead set the right-hand side to 
Zero. 


Example 4.1.2. As an example, we will now attempt to find and plot the equation of the 
plane through the three points P(5, —8,13), O(—7,2,4), and R(16,10, —9): 


150 


P = {5, —8, 13}; Q = {-7, 2, 4}; R = {16, 10, —9}; 

PtsPlot = Graphics3D[{PointSize[0.06], Blue, Point[{P, Q, RH}, 
Axes— True, PlotRange— {{—20, 20}, {—20, 20}, {—20, 20}}, Boxed 
— False]; 

TxtPlot = Graphics3D[{Black, Text["P", {5, —8, 14}], Text["Q", 
{—7, 2, 5}], Text["R", {16, 10, —8}]}, Boxed — False]; 
EqnPlane = e x + f y + g z == 1; 

EqnPlanel = EqnPlane /. {x—P[[1]], y—>P[[2]], z—P[[3]]} 


5e—8f+13g == 1 


EqnPlane2 = EqnPlane /. {x—Q[[1]], y+Q([[2]], z+Q[[3}]]} 
—Te+2f+4g ==1 


EqnPlane3 = EqnPlane /. {x—>R[[1]], y>R[[2]], z-R[[3]]} 
16e+10f-9¢ == 


Soln = Solve[{EqnPlane1l, EqnPlane2, EqnPlane3}, {e, f, g} 


(f> æ > jaar ®> aia} 


PlaneEqn = (EqnPlane /. Solin) ({1}] 
x 363 y 163 z 


8° 1604 * 812 -| 


PlanePlot = ContourPlot3D[ Evaluate[PlaneEqn], {x, —20, 20}, {y, 
—20, 20}, {z, —20, 20}, Mesh—None, ContourStyle—+Red, Boxed 
False]; 


Show[PtsPlot, TxtPlot, PlanePlot, Axes— True, Boxed— True] 


Figure 4.2: The plane fitting the points P, Q, and R. 


151 


20 


The plane is plotted in Figure 4.2, along with the three points. The origin lies below the 


plane, and thus clearly the plane does 


not pass through the origin, but does pass through 


all three points P, Q, and R, as required. 


Returning to our discussion of conic sections, it can be shown that any 
five points in the plane define a unique conic since two distinct conies can 
ence, all six constants, A through F, can 
five points. The five points should be 
plotted in order to determine which type of conic it is since if it is an 
ellipse, then it has an x term, and we can divide the entire equation (4.1) 
it must have either an x? or a y term, 
allowing us to divide by A or C. A hyperbola must have at least one of A, 


intersect in at most four points. H 
somehow be solved for a given 


by A. If the conic is a parabola, 


B, or C nonzero. 


Example 4.1.3. Now we will have Mathematica find the equations and graphs of the 
conic sections through four fixed points, and a fifth variable point chosen from the circle 


of radius 1 with center at (0, 5 ). As 


he fifth point rotates around the circle, the conic 


section changes from one type to another. For the four fixed points, we will pick the 


corners of the unit square: {[0,1], [1, 


], [1,0], [0,0]}. 


Note that we break this process up into two functions. The first, ConicThrough5Points, 
finds the equation of the conic that passes through the given five points; the second, 
ConicFrame, calls the ConicThrough5Points function, takes its result, and plots it 


along with the five points for which i 


was defined along with circle of radius 1 with 


center at (0, 5 ). Figure 4.3 is a frame in the animation which shows a solution with 


algebraic coefficients. Further, note that there is no slider in this manipulation. To move 


the fifth point around the circle, simp 


y click the mouse anywhere on the graph and the 


procedure will use the point on the given circle closest to the point you selected in the 
xy-plane as the fifth point to determine the unique conic section. 


152 


ConicThrough5Points[a_, b., c_, d_, e-, 2, y- ] := Block[ 
{eqns, conic, b1, cl, d1, e1, fl}, 

eqns = (z? + bl z y + cl y? + d1 z + el y + fl == 0) /.{ 
{x—a[[1]],y—>a[[2]]}, {e+ [[1}],y+5([2]]}, {2—+e[[1]],ye[[2]]}, 
{x—d{[1]],y-+4[[2]]}, {ee[[1]],y—e[[2]] }}s 

conic = (z? + b1 z y + cl y? + d1 z + el y + fl == 0) /. 
Solve[eqns, {b1, cl, d1, e1, f1})[[1]}; 

Return[conic]]; 


ConicFrame[pti_] := Block[ 

{rawgraphics, conic, conicplot, x, y, pt}, 

pt = {0, 1/2} + (pt1 — {0, 1/2})/Norm|[pt1 — {0, 1/2}; 

rawgraphics = {PointSize[.02], Black, Point[{{0, 0}, {0, 1}, {1, 0}, 
{1, 1}}], Red, Point[pt], Black, Circle[{0, 1/2}, 1]}; 

conic = ConicThrough5Points[{1, 1},{0, 0},{1, 0},{0, 1}, pt, x, yl; 

conicplot = ContourPlot{Evaluate[conic], {x, —5, 5}, {y, —5, 5}, 
Frame— False, Axes— True, AxesOrigin—+{0,0}, PlotPoints—+75]; 

Return({Show[conicplot,Graphics{rawgraphics], Plot Label—conic]}}] 


Manipulate[ConicFrame[spot],{{spot,{—1/3,3/2}}, Locator, Appear- 
ance-+None}, SaveDefinitions— True] 


Figure 4.3: The Manipulate command uses the two functions 


ConicFrame and ConicThrough5Points to construct the unique 
conic section that passes through five given points. 


153 


B-a (raV) (iV) =o 
y 


S} 


Homework Problems 


1. Find equations of the circles that pass through the following sets 
of points: 


(a) {(0,-1- v3) ,(1,1), (1 + V3,—-2)} 
(b) {(3, 7) , (3,1) , (6,4)} 


2. Find equations of the planes that pass through the following sets 
of points: 


(a) {(1, —2, —5) , (—3, —2, 1) »(—3, =], 3)} 


(b) {(—4, 3, 8) ’ (—6, —2, 1) ? (—3, 0, 3)} 


154 


3. Find the equation of the plane that passes through the following 
points: 


{(1, -2, 11) ,(-1, 1,-7) (2, 1,2)} 


4. A sphere of radius r, centered at the point (a, b, c), can be 
expressed by the equation 


(z-a) + (y—b)? +(z-—c)? =r? 

Construct a linear system of equations that can be solved to find the 
sphere that fits a set of data points in R?. How many points are 
required to determine a unique sphere? 

5. Find the equation of the spheres that fit the following set of 
points: 


(a) {(2,3,3 — 2V3) , (4, 1 + 2V3, 3) , (2, 1, —1) , (6, 1,3)} 


(b) {(—3, —1, —3) , (0, —4, 1) ,(—2, 0, 1) , (1, —1, 1)} 


6. Explain what happens when you attempt to find the equation of a 
conic for which three of the given points are collinear. 


7. (See Section 4.4 for more details on rotations in the plane.) Let 
(x', y") be a new coordinate system that is a rotation about the origin 
through the angle 0 of the standard (x, y) Cartesian coordinate 
system. (This is actually a rotation about the origin of the x and y 
axes to produce the new x’ and y’ axes, respectively.) Then these 
two coordinate systems are related by the equations 


x’ = cos(#) x — sin(@) y 


y' = sin(@) x + cos(0) y 


or the single matrix equation 


x’ | _ | cos(@) —sin(@) x 
y' | | sin(@) — cos(@) 4 


Let 


Ax? + Bry + Cy? + Dr+ Ey+F=0 


155 


be a conic section in the xy-coordinate system. Find the equation of this 
conic section 


A'r’? 4 B'z'y' +C'y'? + Dia! + E'y' F=f 


in the x'y’-coordinate system. In particular, find formulas for the 
coefficients A’ through F” in terms of A through F and 8. Also, show that 
the discriminants of both equations are equal: 


B’? — 4A4'C’ = B? —4AC 


Also, find a formula in terms of A through F for the angle 6 that makes B’ 
= 0. Now find this angle 0 and corresponding values of A’ through F” for 
the conic given by 


4r? + 62ry — 2y?+72+y—-1=0 


Mathematica Problems 


1. Given the points {(—5,0), (—2,8), (0,5)}, find the equations of the 
parabolas whose axes of symmetry are parallel to both the x and y 
axes, going through the three points. Also graph both parabolas with 
the points. 

2. The points {(-l1, —1), (0,4), (0, -4), (2,2), (2, —2)} lie on an 
ellipse. Find the equation of the ellipse. 

3. Find the hyperbola that passes through the points 


{(—1, 2), (0,3), (1, 1), (2, 4), (—2,3)} 


and plot both the hyperbola and the points together on the same 
graph. 

4. Using the points in problem 3, attempt to find the equation of the 
ellipse that passes through them. 

5. Plug the points from problem 3 directly into the equation for the 
general conic section given in (4.1) and compare your result to that 
of problem 3. (Hint: Assume A + 0.) 

6. Using the Mathematica functions that generated the images in 
Figures 4.3, attempt to determine what happens when the fifth point 


156 


that lies on the circle becomes collinear with a pair of points out of 
the four given. 

7. Rerun the code that generated Figure 4.3, but this time, create a 
circle that lies completely inside the box generated by the four 
predetermined points. Points on this circle will be the location for 
the fifth point on the conic. Describe the changes, if any, in the 
overall behavior of the conic sections when the fifth point lies 
entirely inside the box. 

8. Find the equation of the line through the two points (3, —8) and 
(—6,11) using the solution to a linear system. Plot this line along 
with these two points. 

9. Find the equation of the line in xyz-space R? through the two 
points (—5, 7, -11) and (9, —2,4). Plot this line in space along with 
these two points. (Hint: Think of this line as the line of intersection 
of two planes.) 

10. Redo homework problem 7 with as much of the work done by 
Mathematica as can be reasonably performed. 


4.2 Applications of Linear 
Systems to Curve Fitting 


In Section 4.1, given a certain number of points, we were able to find a 
conic curve that passes through all the points. Next, we will attempt to 
generalize these ideas in an effort to fit a more generalized curve through 
a set of points. In particular, we will be interested in finding a linear 
combination of a certain type (e.g., trigonometric, exponential, or powers) 
of function whose graph will pass through all the points of a planar 
dataset. 


Definition 4.2.1. A function F(x) is a linear combination of the functions 
(fila), fala), <- fala)} 


if F(x) can be expressed as 


157 


(4.6) 
F(x) = ay f,(a) + a2 fol) +++» +@nfn(z) = D> an fe (2) 


k=1 
for scalars a1, a2,..., An. 


As an example, it might be appropriate to use trigonometric functions, 
such as 


{cos(0z) = 1, cos(x), sin(z),cos(2z),sin(2z),...,cos(nz), sin(nz)} 


if the dataset is known to be from a source that has a wave format, such as 
sound (music), light (fiberoptic communication), or even stock prices. 
Notice that sin(0x) = 0 and is thus omitted. Exponential functions, such as 


fen Le Tat 67 ce we 


would be used if the dataset were from a source that can be very large and 
perhaps very small simultaneously, such as population (people or bacteria) 
or even the national debt and has horizontal asymptotes. 


A polynomial is a linear combination of nonnegative powers of x, and so 
we begin with an example involving it. We want to find a quadratic 
polynomial y = ax? + bx + c whose graph passes through a dataset of three 
points; that is, we want the standard position parabola through three 
noncollinear points. This will require that we solve a system of three 
linear equations (one for each point) in the three unknowns a, b, and c. 
Three points is the only number of points that give you a unique parabola. 
Consider, for instance, a line; a line is uniquely defined by two points. 
Given only one point, there are an infinite number of lines that go through 
it. If we have three points, the only way in which a line can go through all 
three is if they all lie on the same line; hence you would only need to use 
two points to determine the equation of the line. If the three points were 
not collinear, then no line would pass through them. So if we have three 
points, we can find a unique parabola that goes through them. Given only 
one or two points, there are an infinite number of parabolas that pass 
through them. Given four points, the only way that a parabola can pass 
through all four is if the points already lie on a predetermined parabola, in 


158 


which case any three of the points will give enough information for us to 
find the equation of the parabola. 


Example 4.2.1. As an example, we want to find the parabola through the three points 
P(-21, 15), O(9, -4), and 7(37, 26): 


P = {—21, 15}; Q = {9, —4}; T = {37, 26}; 


PtsPlot = Graphics[{PointSize[0.05], Blue, Point[{P, Q, T}]}, Axes 
True]; 


TxtPlot = Graphics[{Black, Text["P", {—21, 19}], Text["Q", {6, 
—2}], Text["T", {37, 20}]}); 


EqnParabola = y == ax? +bx+c; 
Eqn1 = EqnParabola /. {x—P[[1]], y—P[[2]]} 


15 == 441a—21 b+c 

Eqn2 = EqnParabola /. {x—+Q([[1]], y—Q[[2]]} 
—4 == 8l a+9 b+c 

Eqn3 = EqnParabola /. {x-+T[[{1]], y—>T[[2]]} 
26 == 1369a+37 b+c 

Solns = Solve[{Eqn1, Eqn2, Eqn3}, {a, b, c}] 


[fa> 179 b1 c>- 
6090 ° 6090° 145 


Parabola = (EqnParabola /. Solns[[1]])[[2]] 
559 1709x 179x? 


145 6090 Gj 6090 


ParabolaPlot = Plot[Parabola, {x,—25,40}, PlotStyle—{{Red,Thick- 
ness[0.01)}}]; 


Show[ParabolaPlot, TxtPlot, PtsPlot] 


159 


Figure 4.4: The parabola that passes through all the three data 
points. 


y 


-5 


In Figure 4.4, we clearly see that the parabola fits the data exactly. On inspecting the 
equations Eqn1, Eqn2, and Eqn3, we can determine the general form for finding the 

coefficients a, b, and c to the parabola y = ax? +bx+c given points {(x1,y1), (2,2), 
(x3,y3)} Putting it in matrix form, we have 


2 
Ti Tı 1 yı 


x3 Zr 1 ye 
z æ 1 ys 


Performing Gauss-Jordan elimination on this matrix will solve for a, b, and c that 
correspond to columns 1, 2, and 3, respectively. 


Example 4.2.2. Now let us look at an example where the function desired is a linear 


combination of seven exponential functions. The dataset we want the function to fit is the 
seven points 


{(0, 2), (20, 7), (40, 15), (60, 26), (80, 47), (100, 63), (120, 72)} 


which is population data taken every 20 years in millions of people. The linear 
combination we need to fit the data to is 


y=A+Be*+Ce* +De*" + Ee* + Fe +Ge™ 


so that we get a square 7 x 7 linear system of equations in the unknowns A through G. 
We will divide each x coordinate by 100 for the actual dataset used in order to prevent 


160 


overflow in our arithmetic since otherwise our exponents become too large. It is quite 
normal to have to alter the dataset in order to obtain better results, but you should 
remember that the resulting function corresponds to the modified dataset, and not the 
original data: 


DataSet = {{0, 2}, {.2, 7}, {-4, 15}, {.6, 26}, {.8, 47}, {1.0, 63}, 
{1.2, 72}}; 


DataPlot = Graphics[{PointSize[0.03], Blue, Point[DataSet]}, Axes 
— True]; 


ExpForm{[z_] := a+b e~*+c e”+d e-?7+e e?7+f e 374g @*; 
Solns = Solve{[ExpForm[DataSet|[All, 1]]] == DataSet[[All, 2]], {a, 
b, c, d, e, f, g} 


{{a > —32050.3, b — 42247.4, c > 13136.3, 
d — —28641.5, e + —2738.84, f + 7820.5, g + 228.396}} 


SolnFunc = (ExpForm[x] /. Solns){[1]] 


—32050.3 + 7820.5 e—3* — 28641.5e72*+ 
42247.4e7* + 13136.3 e* — 2738.84 e** + 228.396 e? * 


ExpPlot = Plot(SolnFunc, {x,—0.05,1.25}, PlotStyle+{{Red, Thick- 
ness [0.01] } }]; 


Show[ExpPlot, DataPlot] 
Figure 4.5: Exponential function fitting the modified data. 


y 


0.2 0.4 0.6 0.8 l. 1.2 


Now, we can get the linear combination that works for the original dataset by replacing x 


ea 
by 100. 


161 


ModExpSoln = SolnFunc /. {x—>+x/100} 


~32050.3 + 7820.5 e793 */ 19 — 28641.5 e7™/504 
42247.4 @—*/ 190 + 13136.3 e*/1 _ 2738.84 e*/50 + 298.396 e? */10 


Plotting the original data and the correctly adjusted function, as seen in Figure 4.6, gives 
the exact same graph as that depicted in Figure 4.5, but with the x-axis adjusted back to 
the correct scaling: 


ExpPlotMod = Plot{ModExpSoln, {x, 0, 120}, PlotStyle+{{Red, 
Thickness{0.01] } }]; 


DataSetMod = Table[{100 DataSet|[i, 1]], DataSet[fi, 2]]}, {i, 1, 
Length[{DataSet] }] 


{{0, 2}, {20.,7}, {40., 15}, {60., 26}, {80.,47}, {100.,63}, {120.,72}} 


DataPlotMod = Graphics[{PointSize[0.03], Blue, Point{DataSet Mod 
]}, Axes True]; 


Show[ExpPlotMod, DataPlotMod] 


Figure 4.6: Exponential function fitting the original data. 


20 40 60 80 100 120 


Now let us attempt to generalize this process. If given a dataset and 
function set 


{(t1,41) ; (£2, y2) yore? (Pai Yn) } $ {fi (£), falz), t.. ;Jfala)} 
respectively, we can fit the data to a function expressed as the linear 


combination of functions, given in (4.6), by solving the system whose 
augmented matrix is given by 


162 


Filti) fe(ti) = falti) tn 
filz2) fe(z2) > fn(z2) ye 


(4.7) fi (tn) fa (tn) pay fa (2n) Yn 


If there are more points than functions in the dataset, then we end up with 
an overdetermined system that may not have a solution. In this case, the 
best that can be done is to get as “close” as possible to all the points with 
our resulting function. This can be accomplished by the method of least 
squares, Which will be covered in Section 11.2. 


Homework Problems 


1. If a set of data has two points that have the same x coordinate, but 
different y coordinates, a problem occurs when we attempted to 
perform Gauss-Jordan elimination. As an example, consider the 
function y = Ax? + Bx + C, and the set of points {(1,1), (2,3), (1,2)}. 
What is the problem, and why does it occur? 

2. Set up, but do not solve, the matrix required to find the constants 
to fit the following data points to the corresponding functions: 


(a) {(0,0), (1,2), (—3,4)}, {1,2,2?} 


(b) {(0,0), (1,2), (—3, 4), (-1,5)}, {1, a, 2?, 23} 


(c) {(2,3), (1,2), (—3, 4), (—1, 5)}, TS 
(d) {(0, 1), (x, 2), (-F -1)}, {1, sin(x), cos(x)} 


3. So far, this section has been devoted to finding a one-dimensional 
curve of the form y = aifi(x) + a2f2(x) +...+anfn(x) given a set of n 
data points of the form (xi, yi). Discuss how this method can be 
extended to functions of two variables, given by z = ajfi(x,y) + 
a2fr(x,y) +... +anfn(x,y), with n data points of the form (xi, yi, zi). 


163 


4. Set up, but do not solve, the matrix required to find the constants 
to fit the following data points to the corresponding functions: 


(a) {(0, 1, 2), (1,2, 4), (—1, 2, —1), (1,0, —3)}, {1, 2, y, ry} 


(b) {(0, 1,2), (1,2, 4), (—1, 2, —1), (1,0, —3)}, {1,2,y, (a — y)*} 


5. The Lagrange polynomial L(x) for a dataset Dn of n points given 
by 


{(21, y1), (22, Y2), -» (En, Yn)} 


for distinct x coordinates, is the smallest degree polynomial that 
passes through all of the points of the dataset. What is the maximum 
degree of the Lagrange polynomial L(x) passing through Dn? 


Mathematica Problems 


1. Solve the systems from homework problems 2 and 4. Graph the 
points and the function on the same graph to verify that the solution 
curve does indeed fit the data. 


2.This is a continuation of homework problem 5. 
(a) Find the equation of the Lagrange polynomial L(x) that 


passes through the six points 
{(—5.258, 104.0773128), (0,3.14159), (—3.1, 44.58859), 
(—1.6, 18.05359), (4.9, 43.46859), (2.3, 5.92459) } 


(b) Now, plot together these six points and their Lagrange 
polynomial. 


3. This is a continuation of homework problem 5. 
(a) Find the data set of 10 points equally spaced on the graph of 


cil 
y= sin(x) for x € [0, 2]. 
(b) Find the equation of the Lagrange polynomial L(x) that 
passes through these 10 points. 


(c) Now plot together these 10 points, their Lagrange 


cil 
polynomial L(x) and y = sin(x) for x € [0, 2]. 


164 


(d) The Lagrange polynomial L(x) approximates sin(x) for 

angles in radians in the first quadrant. Compare the values of 
rrr 

sin(x) and L(x) for x = 6, 3, 4. Do you believe that L(x) is a 

very good approximation of sin(x) for angles in the first 

quadrant? Explain your answer? 


4. Let Do be the nine-point dataset given by 
Dy = {(—8,3.9), (—6, —1.7), (—4,5.5), (—2, 1.4), 
(0,-3.2), (2,4.2), (4,0.3), (6,-2.8), (8,5.1)} 


(a) Find the linear combination F(x) of the trigonometric 
functions 


{1, sin(x), cos(x), sin(2zr), cos(2z), sin(3x), cos(3z), sin(4zr), cos(4x)} 


whose graph passes through the points of Do. 
(b) Now plot Do and F(x) together. 


4.3 Applications of Linear 
Systems to Economics 


Now we switch from purely mathematical or physical applications to ones 
that are financial. Our two financial applications will look at investing for 
retirement and taxes. The business application we will look at is the 
Leontief input-output model. 


Example 4.3.1. Let us begin with taxes. We will call a tax system “fair” if there are no 
taxes paid on other taxes. In other words, you do not pay federal income tax on the 
amount you have paid on state, city, and property taxes, and the same is true with regard 
to the other taxes. Let us now assume that our tax system is fair and that you have a total 
taxable income of $57,650. You also pay a 19.3% federal tax, a 4.25% state tax, a 0.55% 
city tax, and a 2.35% property tax. What are the amounts of each tax paid, what 
percentage of your total taxable income is each tax, and what is your overall tax rate as a 
percentage of your total taxable income? Let x, y, z, and w respectively be your federal, 
state, city, and property tax amounts. Then we have the system of equations 


165 


x = 0.193(57650 — y — z — w) 

y = 0.0425(57650 — z — z — w) 
z = 0.0055(57650 — x — y — w) 
w = 0.0235(57650 — x — y — z). 


This system consists of four linear equations in four variables guaranteeing a unique 
solution, and now we can solve them: 


EqnFed = x == 0.193 (57650 — y — z — w); 

EqnState = y == 0.0425 (57650 — x — z — w); 

EqnCity = z == 0.0055 (57650 — x — y — w); 

EqnProp = w == 0.0235 (57650 — x — y — 2); 

SolnRates = Solve[{EqnFed,EqnState,EqnCity,EqnProp},{x,y,z,w}] 


{x > 1049.6, y > 1948.67, z + 242.799, w — 1056.54} 
FedRate = (x /. SolnRates{[{1]])/57650 
0.182126 

StateRate = (y /. SolnRates|[1]])/57650 
0.0338018 

CityRate = (z /. SolnRates|[1]}])/57650 
0.0042116 

PropRate = (w /. SolnRates[[1]}])/57650 


0.0183267 


OverAllRate = FedRate + StateRate + CityRate + PropRate 


0.238466 


In this fair tax system, you end up paying roughly 24% of your total taxable income 
toward these four taxes. 


166 


Example 4.3.2. As a second example, let us look at investing for retirement. When you 
retire, you discover that you have a total of $248,000 saved for retirement. You decide to 
invest the total amount of this savings among three types of investments. One is a simple 
interest savings account at 4.35% per year, another is a certificate of deposit at simple 
interest of 6.15% per year, and the third is a mutual fund at simple interest of 8.75% per 
year. You have decided that you want a yearly total yield of $19,500, and that three times 
as much should be invested in the mutual fund as the other two investments together. 
How much should you place in each type of investment? 


Let x, y, and z be the amounts placed in each of the investments of savings, certificates, 
and mutual fund, respectively. Then, we have the three linear equations: 


x+y +z = 248000 
0.04352 + 0.0615y + 0.08752 = 19500 
z =3(x +y) 


Now we solve for x, y, and z: 


EqnTotal = x + y + z == 248000; 

EqnYield = 0.0435 x + 0.0615 y + 0.0875 z == 19500; 
EqnTriple = z == 3 (x + y); 

Solve[{EqnTotal, EqnYield, EqnTriple}, {x, y, z}] 


{{x = 32666.7, y — 29333.3, z + 186000.}} 


From these calculations, $32,666.67 should go into savings, $29,333.33 should go into 
certificates, and $186,000 must go into the mutual fund. 


Now we get to the Leontief open input-output business/economic model In 
this model, we have several industries or companies that are mutual 
suppliers to each other of their products or output. Each company needs as 
input a certain dollar amount of the other companies’ outputs in order to 
meet their mutual production schedules and certain outside demands by 
customers other than these companies. This Leontief model is called 
“closed” if there are no outside demands for these companies’ products. In 
order to best see this idea in action, we look at the following example. 


Example 4.3.3. A town has four companies that produce coal, gas, steel, and electricity. 
Each company buys the others’ products; that is, each company needs as input a certain 
dollar amount of the other companies’ outputs in order to meet its own production. Let us 
assume that the coal company, in order to produce $1 of coal, needs $0.01 of coal, $0.05 
of gas, $0.08 of steel, and $0.17 of electricity. Additionally, the gas company, in order to 
produce $1 of gas, needs $0.03 of coal, $0.01 of gas, $0.06 of steel and $0.14 of 
electricity. Also, the steel company, in order to produce $1 of steel needs $0.11 of coal, 


167 


$0.15 of gas, $0.02 of steel, and $0.09 of electricity. Finally, the electric company, in 
order to produce $1 of electricity, needs $0.10 of coal, $0.23 of gas, $0.05 of steel, and 
$0.03 of electricity. Each month these four companies have outside demands from other 
companies and consumers of their products for $18 million of coal, $74 million of gas, 
$51 million of steel, and $106 million of electricity. What must the total monthly outputs 
be of these four companies to meet exactly these outside demands and their own mutual 
needs of each others’ products? How much of each company’s total monthly output is 
used internally by the four companies? 

Let c, g, s, and e be the total monthly dollar production of each of these companies. Then, 
we have the four production equations starting with coal, gas, steel, and electricity, 
respectively: 


c — (0.01c + 0.03g + 0.11s + 0.10e) = 18,000, 000 
g — (0.05 + 0.01g + 0.15s + 0.23e) = 74, 000, 000 
s — (0.08¢ + 0.06g + 0.02s + 0.05e) = 51,000, 000 
e — (0.17¢ + 0.14g + 0.09s + 0.03e) = 106,000, 000 


This gives a square 4 x 4 linear system of equations for which we expect a unique 
solution: 


Soln = Solve[{c — (.01 c + .03 g + .11 s + .10 e) == 18000000, g 
— (.05 c + .01 g + .15 s + .23 e) == 74000000, s — (.08 c + .06 g 
+ .02 s + .05 e) == 51000000, e — (.17 c + .14 g + .09 s + .03 e) 
== 106000000}, {c, e, g, s}] 


{c — 4.38508 x10", e > 1.40839x 10°, g + 1.20315x 108, s => 7.01724 107} 
Now, this tells us that total monthly coal production must be $43.8508 million, total 
monthly gas production must be $120.3146 million, total monthly steel production must 
be $70.1724 million, and finally total monthly electrical production must be $140.8494 
million. Also, the internal use of coal, gas, steel, and electricity by these four companies 
is $25.8508, $46.3146, $19.1724, and $34.8394 million, respectively. All these internal 
costs can be computed by taking the difference of the total output and the outside 
demands: 


(c — 18000000) /. Soln[[1]] 
2.58508 x 107 
(e — 106000000) /. Soln[[1]] 


3.48394 x 10" 


168 


(g — 74000000) /. Soln[[1]] 
4.63146 10" 
(s — 51000000) /. Soln{[1]] 


1.91724 107 


Mathematica Problems 


1. Solve Example 4.3.1 without Solve; instead, use RowReduce on 
the augmented matrix of this linear system. 


2. Solve Example 4.3.2 without Solve; instead, use RowReduce on 
the augmented matrix of this linear system. 


3. Solve Example 4.3.3 without Solve; instead, use RowReduce on 
the augmented matrix of this linear system. 


4. Your great great uncle Wolfram has just left you an inheritance 
of $8,278,325; you got this money because you are his only living 
relative and you persuaded him that you would take care of his 10 
dogs. On receiving this money, you immediately spent $100,000 on 
yourself and then wisely decided to invest the rest. You have 
decided to invest the remaining amount in a combination of four 
different ways: a savings account earning simple interest of 3.55% 
annually, a certificate of deposit earning simple interest of 5.75% 
annually, a mutual fund earning simple interest of 7.25% annually, 
and a stock portfolio earning simple interest of 9.15% annually. 

You want to earn $525,000 annual interest on these investments, 
you want the mutual fund to be the sum of savings and the 
certificate of deposit, and you also want the stock portfolio to have 
one-fifth the sum of the other three investments. How much money 
should be invested in each of these four ways? Use RowReduce to 
solve the problem, and then verify your answer with the Solve 
command. 

5. In problem 4, increase the total annual earnings of $525,000 until 
the problem no longer has a positive solution for the four amounts 


169 


of your investments. What is the maximum total amount of annual 
earnings that you can receive to the nearest dollar if all other 
conditions of the problem are kept the same, and for this maximum 
annual earnings, what are your four investment amounts to the 
nearest dollar? 


6. A small business has a taxable annual income of $1,439,535. It 
pays taxes under a fair tax system where there are no taxes on taxes. 
The federal income tax rate is 13.25%, the state income tax rate is 
4.15%, the county income tax rate is 0.35%, the city income tax rate 
is 0.15%, the property tax rate is 0.75%, and the school tax rate is 
0.25%. What must this business pay on each of these six taxes, and 
what is their overall fair tax rate? Use RowReduce to solve the 
problem checking your answer with Solve. 


7. In problem 6, replace all of the tax rates along with the taxable 
annual income of the business by different letters. Use Mathematica 
to find a formula in terms of these letters for the business’ overall 
fair tax rate. 


8. Use the Leontief economic model for this problem. A huge 
multinational corporation has seven industrial production divisions: 
natural gas, oil, coal, steel, electricity, plastics, and mineral mining 
(denoted g, o, c, s, e, p, and m, respectively, in the table below). The 
corporation has determined that it takes the following amounts of 
each of these divisions to make $1 of a particular division’s product. 


o c S e p m 
0.01 0.07 0.15 0.13 0.01 
0.02 0.01 0.00 0.03 0.05 0.11 0.04 
0.03 0.01 0.00 0.05 0.08 0.02 0.00 
0.04 0.01 0.07 0.02 0.09 0.01 0.02 
0.12 0.01 0.04 0.02 0.03 0.01 0.01 
0.05 0.07 0.00 0.01 0.04 0.01 0.01 
0.03 0.02 0.01 0.05 0.08 0.02 0.00 


$1 of g 
$1 of o 
$1 ofc 
$1 of s 
$1 ofe 
$1 of p 
$1 of m 


If the corporation has outside demands for $3B ($3 billion) of 
natural gas, $5B of oil, $1B of coal, $4B of steel, $10B of 
electricity, $8B of plastics, and $6B of minerals per week, then how 
much must be each division’s total production each week to meet 
all demands? How much of each division’s total production goes to 


170 


meet its internal demands per week? Use RowReduce to solve the 
problem checking your answer with Solve. 


9. Find a general formula for the augmented matrix needed to solve 
a Leontief economic model in terms of the internal demand amounts 
as in the table for problem 8 and the outside demand amounts. 
Apply this formula to the Leontief example given in this section and 
then RowReduce it to see if you get the same answer. 


Research Projects 


1. Leontief won the Nobel Prize in Economics for his work. 
Research what Leontief did and see if the concept of linear systems 
was useful in his work. 


2. If you have some programming experience and/or are brave 
enough, write a Mathematica function that inputs the list L of your 
different fair tax rates and your total annual taxable income J, and 
outputs the individual fair tax amounts in the same order as the rates 
were given as well as your overall fair tax rate. 


3. If you have some programming experience and/or are brave 
enough, write a Mathematica function that inputs the information 
for a Leontief economic model and outputs the total production 
amounts for each division of the corporation in the same order the 
information was input. 


4.4 Applications of Matrix 
Multiplication to Geometry 


In the plane and space, we can do rotations of objects about the origin 
using matrix multiplication. We will begin with rotating a point P(x0,vo) 
in the xy-plane about the origin through an angle 0. Here, simple 
trigonometry will allow us to compute the coordinates of the new point 
Q(x1,y1) after rotation by the angle 0. After this has been accomplished, 
we can turn the process of rotation into matrix multiplication. To begin, 
we recognize that the new point Q will be the same distance r from the 


171 


origin as the original point P. So both points P and Q lie on the circle with 
center the origin and radius r. The point Q is just at an extra angle 0, as 
measured from the positive x-axis, relative to the position of P on this 
circle. If the point P is at angle ¢, then the new point Q is at the angle ¢ + 
0. Since all the points on the circle of radius r with center at the origin can 
be written as (r cos(a), r sin(a)) for position at angle a, measured from the 
positive x-axis, we now know that the point P has coordinates (r cos(@), r 
sin(¢)) while the point Q has coordinates (r cos(¢ + 0), r sin(¢d + 0)). 
Simple trigonometry, right? 


Example 4.4.1. We will now illustrate this concept by rotating, through an angle 0 = 7, 


ihe pois (5 COs (3) ? 5 sin (F)) The angle ¢ for the si gM 
(5cos (= + $r) ,5sin (Z + 27)) 


Then the new point Q is at 


Example 4.4.1. We will now illustrate this concept by rotating, through an 
angle Ø = $r, the point P (5cos(#),5sin(7)). The angle ¢ for the point P 
is # = $. Then the new point Q is at (5cos (7 + 47) ,5sin(= + 47)). 


CircPlot = Graphics[{Blue, Thickness[0.01], Circle[{0, 0}, 5)}); 

P = {5 Cos[7/6], 5 Sin[7/6]}; 

Q = {5 Cos[7/6 + 4 7/5), 5 Sin[r/6 + 4 2/5)}; 

Arc = Graphics[{Red, Thickness[0.015], Circle[{0, 0}, 5, {7/6, 7/6 
+ 4 w/5})})5 


Angles = Graphics[{Black, Thickness[0.010], Circle[{0, 0}, 1.8, {7/6, 
x/6 + 4 r/5}), Circle[{0, 0}, 2.5, {0, 7/6}; 


PtsPlot = Graphics[{PointSize[0.05], Black, Point[{P, Q}]}, Axes 
True]; 


TxtPlot = Graphics[{Black, Text{"P", {5.3, 2.5}], Text("Q", {—5.5, 
0.5})], Text["8", {—0.4, 0.7}], Text["@", {1.7, 0.45}]}]; 


Lines = Graphics[{Black, Thickness[0.010], Line[{P, {0, 0}, Q}]}]; 
Show[CircPlot, Arc, PtsPlot, TxtPlot, Lines, Angles, Axes— True] 


Figure 4.7: Original point P and the rotated point Q. 


172 


Now that we know from trigonometry how to go from the coordinates of the original 
point P to the coordinates of the new rotated point Q, as depicted in Figure 4.7, we can 
turn this process into a matrix multiplication. What we seek is a 2 x 2 rotation matrix A so 
that Q = AP where P and Q are written as column matrices. The key to finding the 
rotation matrix A is again trigonometry, this time we need a trig identity. Recall that the 
point P(x0,yo) also has coordinates (r cos(g), r sin(¢)) while the point Q(x1,y1) has 
coordinates (r cos(¢ + 0), r sin(¢ + 0)). We now need the trig identities for sine and 
cosine of a sum of two angles that Mathematica can give below: 


TrigExpand[Cos|¢ + 6]] 
Cos{@] Cos|@|—Sin[@] Sin{d} 


TrigExpand(Sin[@ + 8)) 
Cos[@] Sin{@|+Cos[6] Sin[¢] 


Then, the point Q is given by 


Q = (rcos(¢ + 4), rsin(d + 0)) 
= (r cos(ġ) cos(#) — r sin(¢) sin(9), rsin(#) cos(@) + r cos(@) sin(8)) 
= (cos(#)xo — sin(@)yo, cos(#)yo + sin(8)zo) 


Now, we have that Q, which has coordinates (x1,y1), can be expressed in terms of the 
original coordinates (x0,yo) of P and the angle @. We have from the last equation that 


173 


(4.8) 

(21, y1) = (cos(8).xo — sin()yo, cos(0)yo + sin(8)xo) 
If we convert these rows to columns, then we have 
xı | _ | cos(ĝ)xo — sin(@)yo 
yı | | sin(@)ro + cos(@)yo 


and by changing to matrix multiplication, we have 


| Tı a | cos(#) —sin(@) | | To 
(49) k ~ | sin(@) cos(8) | | yo 


Now, we finally see that the 2 x 2 matrix 


Ae = cos(@) —sin(@) 
(4.10) ° | sin(@)  cos(8) 


carries out the rotation about the origin through the angle 6 and it does so by left 
multiplication, that is, Q = AP, where the initial point P and the new point Q are written 
as column matrices. We can now perform the rotation of P about the origin through the 


angle 0 = Sz using matrix multiplication. 


(P = {{4.330127020}, {2.5}}) // MatrixForm 
4.33013 
(“35° ) 


A = {{Cos[@], —Sin[{@]}, {Sin[@], Cos[6]}}; 
(A.P /. {0-44 2/5}) // MatrixForm 


—4.97261 

0.522642 
We obtain the same result through matrix multiplication as we did above through 
trigonometry. 


174 


The origin is the simplest point to rotate about. We will now change the 
center of our rotation to the point C(h,k). If we want to rotate the point 
P(xo,yo) through the angle @ about the center C to get the point Q, then we 
can subtract C from P to get the new point R = P — C, which we can rotate 
about the origin and then translate it back to its correct location by adding 
the point C back on. In other words, we can obtain Q(x1,y1) by 


| £] E | cos(#) —sin(@) | l zo—h r | h 
4.11 yı | | sin(@) — cos(@) yo —k k 


Example 4.4.2. Instead of using a point for an example of this process, we will use this 
rotation formula to rotate the four pedaled flower parametric curve 


(x(t), y(t)) = ((3cos(4t) + 2) 2 cos(t) + 5, (3 cos(4t) + 2) 2 sin(t) + 12) 


7 
for t e [0,27], about the point C(—4, —9) through the angle 0 = 8 x. 


F = (3 Cos[4 t] + 2) 2 Cos[t] + 5; 
G = (3 Cos[4 t] + 2) 2 Sin[t] + 12; 


OrigPlot = ParametricPlot|{F, G}, {t, 0, 27}, PlotStyle+{Blue, 
Thickness[0.01] }); 


center = {{—4}, {—9}}; 
(P = {{F}, {G}}) // MatrixForm 


5 + 2Cos|t| (2 + 3 Cos|4 t}) 
12 + 2 (2 + 3 Cos[4 t]) Sint] 


Q = (A.(P — center) + center) /. {9-77 /8} 
{{ -4 — Cos [=] (9 + 2Cos{t] (2 + 3Cos[4t})) — Sin [=] (21 +2(2 + 3Cosf4t}) Sin{t}) }, 


{ -9 + (9 + 2Cos|t] (2 + 3 Cos[4 t})) Sin [=] - Cos [=] (21 + 2(2 + 3Cosļ4t]) Sin|t}) }} 


RotatePlot = ParametricPlot[Flatten[Q], {t, 0, 27}, PlotStyle> 
{Red, Thickness[0.01] }}; 


CenterPlot = ListPlot|{ Flatten{center] }, PlotMarkers-+{"+" , 
Medium }; 


Show[OrigPlot, RotatePlot, CenterPlot, PlotRange— All] 


Figure 4.8: Original curve in the first quadrant, and rotated curve 
in the third. 


175 


Figure 4.8 clearly depicts the rotation of the parametric curve. Now let us animate this 
rotation so that we can see some of the intermediate curves: 


QArb = A.(P — center) + center 
{{ — 4 + (9 + 2 Cosļt] (2 + 3 Cos[4t})) Cos(@] — (21 + 2(2 +3 Cosļ4 t) Sin|t})Sin(a]}, 


{ — 9 + Cos|6} (21 + 2(2 + 3 Cos[4 t}) Sin[t]) + (9 + 2Cos{t] (2 + 3 Cos[4 t])) Sinf) }} 


QArb /. {0-7 7/8} 
{{-4-Cos [=] (9 + 2Cos{t] (2 + 3 Cos[4t])) — Sin [=] (214+2(2+ 3 Cos(4 t}) Sin|t}) }, 


{ — 9 + (9 + 2Cosjt] (2 + 3Cos[4 t])) Sin [=] — Cos [=] (21 + 2(2 + 3Cosļ4 t]) Sin(e}) } } 


Manipulate[QArbExp = Evaluate [QArb /. {@—+thetaval}]; 
QArbExpPlot = ParametricPlot[Flatten[{QArbExp], {t, 0, 27}, 
PlotStyle—+{Black, Thickness[0.01] }]; 


Show[OrigPlot, QArbExpPlot, RotatePlot, PlotRange-+{{—40, 25}, 
{—40, 25}}], {{thetaval, 0, 0}, 0, 7 7/8, 7/16} 


3 
Figure 4.9: Frame of the animation corresponding to 0 = 8 a. 


176 


D 
rx 
ks 


-I5f | 


Since we cannot display the actual animated graphic in the book, Figure 4.9 is the next 

best thing. Figure 4.9 depicts the first and last curves in the rotation, as well as an 

intermediary frame. In the animation itself, each frame constitutes a rotation by an angle 
T 


of 16 in the counterclockwise direction starting with the original in the first quadrant, 


> 


culminating in a total rotation by 8x represented by the final curve in the third quadrant. 


Homework Problems 


1. Consider the point P(3,0). Without using matrix multiplication, 


find the resulting points Q, R, and S after rotating P about the origin 
t R 3 
by angles 4, 2, and 27, respectively. 


2. Use matrix multiplication to perform the rotations in problem 1. 


3. Given a point P, let Q be the point corresponding to the rotation 
of P about the origin through an angle 6. Let R be the point 


177 


corresponding to the rotation of Q about the origin through the 
angle ¢. Verify that 


Ag Ag = Agi 
and thus that 


R = AgioP = AgAoP = Ag AgP 


4. Geometrically, the same property discussed in problem 3 should 
hold for an arbitrary center of rotation. For instance, if we start with 
a point P, rotate it through an angle @ to the point Q, and the rotate 
Q through an angle ¢ to end up at P, this should be equivalent to 
starting at P and rotating through an angle of ¢ + 0, independent of 
the center. To show this, consider 


Q=Ag(P-C)+C, R=As(Q-C)+C 
and prove that 


5. Find the coordinates of the point Q corresponding to the point 
P(3,3) that has been rotated about the point C(1, 1) by an angle of 6 
Tr 


-4 


6. Given a point P, a point Q, and a center of rotation C, how can 
one find the angle @ through which P was rotated to end up at Q? 


7. Consider the points P(4,5) and Q(2, A + 3) and center of 
rotation C(2, 3). Determine the angle 0 through which P was rotated 
about C to end up at point Q. 

8. Find the coordinates of the point Q corresponding to the point 
P(3,3) after it has been rotated about the point C(1, 3) by an angle 
of 0 = x. Consider this problem from a geometric perspective, and 
explain how you could have known the answer without performing 
any matrix multiplication. 

9. As discussed in this section, the matrix Ag corresponds to a 
counter clockwise rotation about the origin. How can you modify 
the matrix Ag to perform clockwise rotations? 


178 


10. The process of rotation about a point can be generalized to three 
dimensions. Given a point P(xo, yo, z0,)? determine what rotations 
the following matrices perform upon the point P: 


1 0 0 
A; =| 0 cos(#) sin(@) 
0 -—sin(@) cos(@) 


cos(#) 0 —sin(@) 
Ag = 0 1 0 
sin(@) 0 cos(@) 


II 
n O 
=. 
> ® 
P 


Az 


11. (a) What 3 x 3 matrix R will carry out, by a single matrix 
multiplication by R, the following three consecutive rotations in 
space in the given order: first, rotate in space by the angle a about 
the x-axis followed by a rotation by the angle J about the y-axis 
followed by a rotation by the angle y about the z-axis? 


(b) Is it the same matrix R if we switch the order of these three 
consecutive rotations, explain? 


12. How can you use matrix multiplication and addition/subtraction 
to rotate in space about a line parallel to one of the three coordinate 
axes? 


13. Using the information learned in this section, do (or redo) 
homework problem 7 of Section 4.1. 


Mathematica Problems 


1. Rotate the points P through the angles 0 with centers of rotation 
C. 


179 


P(1,2), 9=%, C(0,0) (b) P(2,1), @=%, C(1,0) 


P(7,2), 0 = —2r, C(9,2) (d) P(0,0), @=%, C(2,3) 


P(1,1), @= 3m, C(-1,-1) (f) P(3,-1), 6=—%, C(3,0) 


4 


2. Using the results of homework problem 10, perform the 
following rotations: 


èla 


(a) Rotate P(1, 1, 1) about the x-axis by an angle of 0 


rT 
(b) Rotate P(1, 2, 1) about the z-axis by an angle of 0 = 3. 
2 
(c) Rotate P(1, 1, 2) about the y-axis by an angle of 0 = 3x. 
us 
(d) Rotate P(1, 1, 1) about the z-axis by an angle of 0 = 4, then 


take the resulting point and rotate it about the x-axis by an 
x 
angle of ¢ = 3. 


Tr 
(e) Rotate P(1, 1, 1) about the x-axis by an angle of 8 = 3, then 
take the resulting point and rotate it about the z-axis by an 


z= 
angle of ¢ = 4. 


T 
(f) Rotate P(2, 3, 4) about the x-axis by an angle of 0 = 3, then 
take the resulting point and rotate it about that y-axis by an 


2 
angle of 0 = 37 and finally take this second point and rotate it 
us 


about the zaxis by an angle of p = 4. 


3. Construct a set of piecewise parametric functions that, when 
graphed, appear to be the first, middle, and last initials of your 
name. Rotate these initials about a point and animate the sequence. 


4. A complex number z can be written as follows: 


z=a+bi= |z|? = |z|cos(ġ) + |z|sin(d)i 


where |2| = v a? + b° is the modulus of z, the distance from z as 
the point (a, b) to the origin, and ¢ is the angle between the complex 
number z as the point (a, b) and the positive x-axis. Using complex 
numbers instead of matrices, find a formula for rotating the point 
P(xo,yo) about the centerpoint C(A,k) counterclockwise through the 


180 


angle ¢ to get the new point Q. Use this formula for Q on an 
example from this section to see that it is correct. (Hint: If you want 
to rotate a point P(a, b) about the origin through the angle 0 to get a 
new point Q, then rewrite P as a complex number z and think of 
what multiplying z by e’ ° will do to z.) 


Research Projects 


1. Research the real quaternions H, which are a generalization of the 


complex numbers C , and find out how they can be used to do rotations in 
space. 


4.5 An Application of Matrix 
Multiplication to Economics 


A modern company typically makes many kinds of similar items and has a 
production quota to meet. In the production of all of these items, the 
company uses several types of labor in varying amounts. It therefore 
knows how many worker-hours of each type of labor (on average) it takes 
to make one type of item. The most convenient way to manipulate and 
change both production and worker-hour data is to place the data in a 
matrix, which is our mathematical version of a spreadsheet. The type of 
example we will look at concerns the areas of business related to 
production management, quality control, and efficiency since we will look 
at production, worker-hour usage, cost, and defective rate data for a 
quarter’s production for one company. 


Example 4.5.1. Our example concerns a large car manufacturer. It produces the five 
types of items: compact cars, sedans, sports cars, trucks, and SUVs using the four types 
of labor: upholstery work, metal work, electrical work, and general assembly. The 
following matrices Tı and P| are the time matrix in worker-hours and the production or 
order matrix for the first quarter of January, February, and March. We have 


181 


labor per vehicle compact car sedan 


upholstery 
Tı = metal 
electrical 
general assembly 


and 


vehicle per month January 


compact car 
sedan 
sports car 
truck 
SUV 


P, = 


0.735 
1.515 
1.105 
2.465 


sports car 
1.105 0.825 
1.785 1.325 
1.325 0.905 
2.535 2.015 


7825 
5085 
1205 
4095 
3820 


8635 
5530 
1365 
5140 
4235 


truck SUV 
0.765 0.855 
1.605 1.905 
1.535 1.725 
2.935 3.265 


February March 


8950 
5815 
1420 
5375 
4680 


(T1Headings = {{"labor per vehicle","compact car","sedan","sports 
car", "truck", "SUV"}, {"upholstery", .735, 1.105, .825, .765, .855}, 
{"metal", 1.515, 1.785, 1.325, 1.605, 1.905}, {"electrical", 1.105, 
1.325, .905, 1.535, 1.725}, {"general assembly", 2.465, 2.535, 2.015, 
2.935, 3.265}}) // MatrixForm 


labor per vehicle compact car sedan 


upholstery 0.735 
metal 1.515 
electrical 1.105 
general assembly 2.465 


(P1Headings = {{"vehicle per month", 


1.105 
1.785 
1.325 
2.535 


sports car truck 
0.825 0.765 
1.325 1.605 
0.905 1.535 
2.015 2.935 
"January", 


SUV 
0.855 
1.905 
1.725 
3.265 


"February", 


"March" }, {"compact car", 7825, 8635, 8950}, {"sedan", 5085, 5530, 
5815}, {"sports car", 1205, 1365, 1420}, {"truck", 4095, 5140, 5375}, 
{"SUV", 3820, 4235, 4680}}) // MatrixForm 


vehicle per month January February March 
8635 
5530 
1365 
5140 
4235 


compact car 7825 
sedan 5085 
sports car 1205 
truck 4095 
SUV 3820 


8950 
5815 
1420 
5375 
4680 


We now want to know how many total worker-hours of each type of labor is needed for 
each of the 3 months of the first quarter in order to exactly meet the production quota, 
call this matrix the total workforce matrix M1. You can get the matrix Mı by multiplying 
Tı times P1, but first we must remove the headings. The command Drop will take care of 
this removal. After we compute M1, its headings will be put in place. 


182 


(M1 = Drop[T1Headings, 1, 1].Drop[P1Headings, 1, 1]) // Matrix- 
Form 


18763.2  21136.5 22288.6 
36377.8 41079.1 43362.8 
29350.1 33299.5 35203.4 
59098.3 66967.5 70719.9 


(MiHeadings=Join[{{"labor per month"}, {"upholstery"}, {"met- 
al"}, {"electrical"}, {"general assembly"}}, Join[{{"January", "Feb- 
ruary", "March"}}, M1], 2]) // MatrixForm 


labor per month January February March 
upholstery 18763.2 21136.5 22288.6 
metal 36377.8  41079.1 43362.8 
electrical 29350.1 33299.5 35203.4 
general assembly 59098.3 66967.5 70719.9 


If we divide the total workforce matrix M1 by 8, then we know how many people must be 
scheduled per month for each type of labor in order to meet first-quarter production, 
assuming that one person does an 8-hour shift. The entries in this matrix should be 
rounded up to obtain integer values. The command Ceiling will round up to the next 
integer. 


(Peoplel = Ceiling[1/8 M1]) // MatrixForm 


2346 2643 2787 
4548 5135 5421 
3669 4163 4401 
7388 8371 8840 


(PeoplelHeadings = Join[{{"people per month"}, {"upholsterers"}, 
{"metal workers"}, {"electrical workers"}, {"general assembly work- 
ers"}}, Join[{{"January", "February", "March"}}, People1], 2]) // 
MatrixForm 


people per month January February March 
upholsterers 2346 2643 2787 
metal workers 4548 5135 5421 
electrical workers 3669 4163 4401 
general assembly workers 7388 8371 8840 


Let us assume that it is normal for the second-quarter production orders to go up across 
the board by 4.15%, while in the second quarter increased efficiency of labor changes by 
6.25%. Then our second-quarter production matrix is P2 = (1 +.0415) P1 = 1.0415 Py 
while the second-quarter time matrix is 72 = (1 — 0.0625) T1 = 0.9375 T1 So the 
second-quarter total workforce matrix is: 


183 


Mz = Tə Pr 
= (0.9375 Tı ) (1.0415 P;) 
= 0.9375 - 1.0415 (T, P,) 
= 0.97640625 Mı 


This says that the total need for workforce has decreased by 2.36% from the first to 
second quarter. Let us now compute these matrices to see if this is correct: 


0.9375 x 
0.976406 


1.0415 


1 — 0.97640625 


0.0235937 


(T2 = .9375 Drop[T1Headings, 1, 1]) // MatrixForm 


0.689063 
1.42031 
1.03594 
2.31094 


1.03594 
1.67344 
1.24219 
2.37656 


0.773438 0.717188 0.801563 
1.24219 1.50469 1.78594 

0.848438 1.43906 1.61719 
1.88906 2.75156 3.06094 


(P2 = 1.0415 Drop[P1Headings, 1, 1] ) // MatrixForm 


8149.74 
5296.03 
1255.01 
4264.94 
3978.53 


8993.35 
5759.5 
1421.65 
5353.31 
4410.75 


9321.43 
6056.32 
1478.93 
5598.06 
4874.22 


(M2 = T2.P2) // MatrixForm 


18320.5 
35519.5 
28657.6 
57703.9 


20637. 
40109. 
32513. 
65387. 


8 21762.7 
9 42339.7 
9 34372.8 
5 69051.4 


184 


0.97640625 M1 // MatrixForm 


18320.5 20637.8 21762.7 
35519.5 40109.9 42339.7 
28657.6 32513.9 34372.8 
57703.9 65387.5 69051.4 


(People2 = Ceiling[1/8 M2]) // MatrixForm 


2291 2580 2721 
4440 5014 5293 
3583 4065 4297 
7213 8174 8632 


(PeoplelHeadings = Join[{{"people per month"}, {"upholsterers"}, 
{"metal workers"}, {"electrical workers"}, {"general assembly work- 
ers"}}, Join[{{"April", "May", "June"}}, People2], 2]) // Matrix- 
Form 


people per month April May June 
upholsterers 2291 2580 2721 
metal workers 4440 5014 5293 


electrical workers 3583 4065 4297 
general assembly workers 7213 8174 8632 


Let us return to the first quarter. After each type of vehicle is produced, it is driven off the 
assembly line, and given a thorough quality control test in order to find any defects 
before it is sent to the dealership. We have the defective rate matrix D1 for the first 
quarter as 


labor vehicle compact car sedan sports car truck SUV 


upholstery 0.825% 1.15% 0.615% 0.905% 1.25% 
Dy = metal 1.5% 1.785% 1.125% 1.905% 1.815% 
electrical 0.735% 0.775% 0.615% 0.845% 0.875% 


gen. assembly 1.65% 1.875% 1.415% 1.925% 1.965% 
and the average cost of repair matrix R1 for the first quarter as 


vehicle labor type upholstery metal electric gen, assembly 


compact car $7.85 $15.10 $9.15 $35.20 

Ris sedan $10.25 $18.90 $11.25 $42.70 
og sports car $8.15 $15.75 $9.45 $29.65 
truck $13.95 $21.05 $11.80 $51.75 

SUV $14.35 $20.60 $10.95 $48.15 


185 


How much does it cost each of the months of the first quarter to make these repairs on 
these five types of vehicles? The answer is Rj D1 P1 since D1 P| is the number of 
defective vehicles of each type per month of the first quarter. In order to have 
Mathematica compute this product for us, we need to delete all headings from these 
matrices, as well as the % and $ notation. We can also ask Mathematica to compute the 
total repair cost for the first quarter by summing the entries in the matrix Rj D1 P1. 


D1 = {{.00825, .0115, .00615, .00905, .0125}, {.015, .01785, .01125, 
-01905, .01815}, {.00735, .00775, .00615, .00845, .00875}, {.0165, 
-01875, .01415, .01925, .01965}}; 

Ceiling[D1.Drop[P1Headings, 1, 1]] // MatrixForm 


216 243 257 
370 419 442 
173 196 206 
396 448 473 


R1 = {{7.85, 15.10, 9.15, 35.20}, {10.25, 18.90, 11.25, 42.70}, {8.15, 
15.75, 9.45, 29.65}, {13.95, 21.05, 11.80, 51.75}, {14.35, 20.60, 10.95, 
48.15}}; 


(Costs1 = Ceiling/R1.D1.Drop[P1Headings, 1, 1]]) // MatrixForm 


22758 25766 27186 
28004 31706 33453 
20920 23685 24991 
33267 37662 39739 
31617 35793 37768 


(CostslHeadings = Join[{{"type of vehicle per month"}, {"compact 
car"}, {"sedan"}, {"sports car"}, {"truck"}, {"SUV"}}, Join[{{"Jan- 
uary", "February", "March"}}, Costs], 2]) // MatrixForm 


type of vehicle per month January February March 


compact car 22758 25766 27186 
sedan 28004 31706 33453 
sports car 20920 23685 24991 
truck 33267 37662 39739 
SUV 31617 35793 37768 


The total cost of making repairs under the quality control to the five types of vehicles 
from the four kinds of work for the first quarter is the sum of the entries to the Costs/ 
matrix. It is found below to be $454,315: 


Total[Drop[Costs1Headings, 1, 1], 2] 


454315 


186 


Mathematica Problems 


All of the following Mathematica problems will use the information given 
in the example from this section as this year’s data, where each one is a 
continuation of the previous problem. In each of these Mathematica 
problems, compute the required matrices with and without their 
appropriate headings. 
1. In next year’s first quarter, it is anticipated that all of the 
following will occur: compact car orders will go up by an average 
of 3.75% each month, sedan orders will go down by an average of 
2.95% each month, sports car orders will go down by an average of 
4.30% each month, truck orders will go up by an average of 1.65% 
each month, and SUV orders will go up by an average of 2.15% 
each month. What is next year’s production matrix P1 for the first 
quarter? 
2. In next year’s first quarter, it is anticipated that all of the 
following will occur as a result of changes in practices and 
automation: next year’s first quarter worker-hours for upholstery 
will go down by an average of 2.55% for each type of vehicle, 
worker-hours for metal work will go up by an average of 1.35% for 
each type of vehicle, worker-hours for electrical work will go down 
by an average of 3.75% for each type of vehicle and worker-hours 
for general assembly will go down by an average of 0.85% for each 
type of vehicle. What is next year’s time matrix Tı for the first 
quarter? 
3. (a) In next year’s first quarter, what is the total workforce matrix 
Mı? 
(b) What is the total amount of people (people] matrix) needed 
for next year’s first quarter by month and by the type of labor 
done if one person is needed to fill one 8-hour shift per day? 
(c) Upholsterers make an average of $21.25 per hour, including 
benefits, metal workers make an average of $24.50 per hour, 
including benefits, electrical workers make an average of 
$26.75 per hour, including benefits and general assembly 
workers make an average of $27.85 per hour, including 
benefits. What is the cost matrix C1 for next year’s first quarter 
by month and type of labor? What is the total labor cost for 
next year’s first quarter for these four types of assembly line 


187 


workers? What is the average salary including benefits for next 
year’s first quarter for all of these four types of assembly line 
workers? 
4. (a) In next year’s second quarter, what is the production matrix 
P2 if it reflects an across the board average drop in orders by 2.65% 
from the first quarter of next year? 
(b) In next year’s second quarter, what is the time matrix T2 if 
it reflects an across-the-board average rise in times by 1.35% 
from the first quarter of next year due to new safety 
regulations? 
(c) In next year’s second quarter, what is the total workforce 
matrix M2, and by what percentage has the need for labor 
changed from the first quarter to the second quarter of next 
year? 
(d) What is the total amount of people (peop/e2 matrix) needed 
for next year’s second quarter by month and by the type of 
labor done if one person is needed to fill one 8-hour shift per 
day? 
(e) If all labor costs have gone up from the first quarter of next 
year to the second quarter of next year by an average of 2.25%, 
then what is the cost matrix C2 for next year’s second quarter 
by month and type of labor? What is the total labor cost for 
next year’s second quarter for these four types of assembly line 
workers? What is the average salary including benefits for next 
year’s second quarter for all of these four types of assembly 
line workers? 
5. (a) Find the defective rate matrix D2 for the second quarter of 
next year if it is anticipated that these defective rates will have gone 
down by an average of 1.75% from the first quarter of this year. 
(b) Find the average cost of repair matrix R2 for the second 
quarter of next year if it is anticipated that these repair costs 
will have gone up by an average of 2.15% from the first quarter 
of this year. 
(c) Find the repair costs matrix Costs2 for the second quarter of 
next year. 


188 


(d) Find the total repair costs for the second quarter of next 
year. What is the average repair cost per vehicle produced for 
the second quarter of next year? 


189 


Chapter 5 


Determinants, Inverses, and 
Cramer’s Rule 


5.1 Determinants and 
Inverses from the Adjoint 
Formula 


We introduced matrix arithmetic in Chapter 2; however, we left out matrix 
division. We now concern ourselves with extending matrix arithmetic to 
matrix division, or more precisely: multiplication by the multiplicative 
inverse. The reason for matrix multiplicative inverses arises in the 
following situation: If we are given a linear system expressed as the 
matrix equation AX = B, where A is the coefficient matrix, X the variable 
column, and B the column of the RHS values, then we want to know when 
we can divide by A, or equivalently, multiply the entire equation by A’s 
multiplicative inverse Aes to obtain the unique solution, X = AIB to the 
system. 


When might our matrix equation AX = B have a unique solution of the 
form X = A` !B? From our past experience, we know that it is usual to get 
a single solution to a linear system only when it is square, which occurs 
only when the matrix of coefficients A is square. This tells us that we 
should only look for multiplicative inverses A! for square matrices A. 


190 


Remember that in the set of square matrices R” | the (multiplicative) 
identity matrix J, was defined to be the n x n matrix with ones on its main 
diagonal from upper left to lower right and zeros everywhere else. For any 
A e R” we have both Aly = A and Ind = A. Now, if A has a 
multiplicative inverse K with K e R”*”", then K must satisfy both AK = In 
and KA = In. These last two equations define what we mean by the 
multiplicative inverse, A£, of a square matrix A. 


Definition 5.1.1. The inverse of an n x n matrix A, denoted Ac is 
the.unique matrix of dimension n x n that satisfies A At aA A Tie 


The simplest case to study is a 1 x 1 matrix [a]. with only one entry,a. The 
real numbers R are the same as the 1 x 1 real matrices [a], with the same 
multiplication, and so if a + 0, then 


vol = b-[8) 


We now move on to the case of 2 x 2 matrices and ask the questions: 
When, and how, does a matrix A € R?*? have a multiplicative inverse? It 
should be clear that not all 2 x 2 matrices will have multiplicative 
inverses, since there should be matrices that have properties similar to 
those of the number zero in regards to multiplicative inverses in the scalar 
case. 


Example 5.1.1. Let us see an example of a 2 x 2 matrix that has no multiplicative 
inverse. If 


wes! 0 
0d 
ford € R then A has no inverse K. To see this, if we arbitrarily define 
K a B 
0 å 
then no choice of a through ô will let 


AK=h=|0 9 | 


191 


since the multiplication 


-_[o 0 
one p kel 


We will have Mathematica verify these calculations quickly: 


A = {{0, 0}, {0, d}}; 
K {{a, B},{8, 5}}; 
A.K // MatrixForm 


0 0 
dð dé 


From this one simple example, it becomes immediately apparent that there 
are an infinite number of 2 x 2 matrices that have no multiplicative 
inverse. This is very different from the scalar case, where zero was the 
only number with no multiplicative inverse. Now, we must strive to find a 
generalized method to determine whether an arbitrary matrix has an 
inverse, and if an inverse exists, we must also find a method of computing 
it. Our next step is to generalize our matrix A € R??? and attempt to solve 
our multiplicative inverse problem. We will treat this problem as a system 
of four equations with four unknowns. Let 


a b 
i|: 4 


for real constants a through d. We want to find the matrix 


s la 
K=|§ 4 


so that AK = h. The matrix equation AK = h will give us a linear system 
of four equations in the unknowns a through ô. Although it is not difficult 
to perform the algebra, we will have Mathematica find the formulas for 
the unknown variables a through ô in terms of known constants a through 
d: 


192 


(A = {{a, b}, {c, d}}) // MatrixForm 


b 
(2a) 
(P = A.K) // MatrixForm 


aa+bé@ af+bé 
ca+d@ c8+dé 


Eqn1 = P{[1, 1]] == 1 
aa+b0 == 1 

Eqn2 = P{[1, 2}] == 
af+bé == 

Eqn3 = P[[2, 1]] == 
cat+d@ == 

Eqn4 = P[[2, 2]] == 1 
cB+d6 == 1 


Solve[{Eqn1, Eqn2, Eqn3, Eqn4}, {a, 3, 0, 5}] 


b d > 
a eel a ee 


This tells us that the multiplicative inverse of A has the form 


pe 1 d —b 
K-o K] 


193 


as long as ad—bc + 0. Thus, if A is defined as before, then we say that 4} 
= K provided that ad — bc # 0. The value ad — bc for the 2 x 2 matrix A is 
called its determinant since its value being zero or nonzero determines 
whether A has an inverse. The Mathematica command Det will compute 
the determinant of a matrix, furthermore, the command Inverse will find 
the inverse to a square matrix, assuming that it exists. 


(K = (1/(ad — b c)) {{d, —b}, {—c, a}}) // MatrixForm 


a a lo 
b c+a d -b c+a d 
-b c+a d -b c+a d 


Simplify[A.K] // MatrixForm 


Simplify[K.A] // MatrixForm 


(01) 


Det([A] 
—bctad 


Inverse[A] // MatrixForm 


gies eben = 
b c+a d b c+a d 
c a 
betad -b c+a d 
Using the commands given above, Mathematica has checked the formula 
for the inverse K of A and shown that both AK = h and KA = h are true. 
In the formula for 4! above, we see that each entry resembles the 


determinant of a 1 x 1 matrix (which is the number itself), divided by the 
determinant of A. In fact, we have 


194 


g d 1 
Al = gam = daca) 


1 ee 
= — det(matrix A with first row and column removed) 


det(A) 
1 
at =. — = 
22 — det(A) det(A) et([a]) 
1 
= — det(matrix A with second row and column removed) 
det (A) 
Aub = -aa = aa eed 
12 “Get(A) ~ -A . 
=- ay det(matrix A with second row and first column removed) 
= c = 
Azı det(A) n ale) 


1 : . 
= —~———-~ det(matrix A with first row and second column removed) 


det(A) 


Now that the 2 x 2 case has been solved, we want to find a pattern for 

computing the entries of A}, where A is a square matrix of arbitrary 

dimension n x n. To do this, we will need to introduce some new 

terminology. 

Definition 5.1.2. Given A € R”””, the transpose of A, denoted A’, is an 
F ' 

element of R”””” defined by Ai 5 Aji 

Simply put, the transpose of a matrix is computed by swapping the rows 

and columns of the matrix. 


Example 5.1.2. If A € R” and A € R are given by 


(5.2) 
i -2 3 

A=|4 5 1], B=| 7}? 9? 
~ 3 =j 


then A? e R: ‘Sand BT € R are given, respectively, by 


195 


14 0 R : 

AT-| -25 31], BT= ò + 
Ss 1 =] | 

4 1 


Definition 5.1.3. Given A € R”, the minor Mij of the element Aj, is the 
determinant of the resulting (n — 1) x (n — 1) matrix found by removing 
row i and column j from A. 


Note that the minor of element Aj; is a scalar, found by taking the 
determinant of a matrix of dimension (n — 1) x (n— 1) 


Example 5.1.3. If we consider the matrix A from equation (5.2), which was used as an 
example for transpose operation, the minor M1,1 of 41,1 is given by 


5 4 
Ms =aet (| § i |)=-8 


On the other hand, the minor M2,3 of A2,3 is calculated as 


1 -2 
Ma = det (| j 3 |)=3 


We now turn to the arbitrary 3 x 3 matrix inverse case. In the process, we 
will find the determinant of a 3 x 3 matrix. We define the matrix A, and its 
potential inverse Ķ as follows: 


abe g r 
AS a E fe KSu © w 
g h 3 zy 2 


for real constants a through i. We require that KA = 13 and AK = 2 We 
choose the latter equation, AK = J3, which will give us a linear system of 
nine equations in the unknowns p through z. We will have Mathematica 
do the work for us and find the formulas for p through z in terms of a 
through i: 


196 


A = {{a, b, c}, {d, e, f}, {g, h, i}}; 
K = {{p, q; r}, {u, V, wł, {x, y, z}}; 
(P = A.K) // MatrixForm 


dp+eu+fx dqtevify dr+ew+fz 


ap+bu+cx aq+bvt+cy ar+bw+cz 
gp+hu+ix gqthvt+iy gr+hw+iz 


Solns = Simplify[Solve[{P == IdentityMatrix[3]}, {p, q, r, u, v, w, 
xX, y, Z}] 


{{ zy fh-ei 
P ceg—bfg-cdh+afh+bdi-aei’ 


F eg-dh 
ceg-bfg-cdh+afh+bdi-aei’ 
TA fg-di 
-ceg+bfg+cdh—afh—bdi+aei' 
ch—bi 
SN eS a LC a ST 
-ceg+bfg+edh-—-afh—bdi+aei 
wish cg-ai 
ceg—bfg—-cdh+afh+bdi-aei’ 
bg-—ah 
yore ELL eas 
—-ceg+bfg+cdh—afh—bdi+aei 
A ce—bf 
ceg—bfg-cdh+afh+bdi-aei’ 
cd—af 
w= 


—ceg+bfg+cdh—afh—bdi+aei’ 


A bd—ae V} 
ceg—bfg—-cdh+afh+bdi-~-aei 


This solution set tells us that the multiplicative inverse K to A, has the 
formula 


197 


(5.3) 


1 ei—fh ch-—bi bf —ce 


eS TE eT) ET eS ich =e 


as long as — ceg + bfg + cdh — afh — bdi + aei + 0. Similar to the 2 x 2 
case, the term required to be nonzero is a determinant for 3 x 3 matrices. 
Once again, we denote K = a 


K = (Simplify(K /. Solns({1}}}); 
Simplify[A.K] // MatrixForm 


1 0 0 
os: @ 
0 QRA 
Simplify[A.K] // MatrixForm 
1 0 0 
0 1 0 
0 0 3 
Det[A] 
—ceg +bfg+cdh-afh-bdi+aei 


In the formula for 4! given in equation (5.3), we see that each entry looks 
like the determinant of a 2 x 2 matrix divided by the determinant of A. In 
fact 


3 _ aa hf _ 1 e f 
Avi = aA ~ det(A) act (| h i J) 


det(matrix A with first row and first column removed) 


1 
-~ det(A) 


Let us check one more entry of A, but off the main diagonal. We have 


198 


,-1 _. ~ah+gb | -1 a b 
A= A ay **(| g h }) 
(—1)*+? 
~ det(A) 


det(matrix A with second row and third column removed) 


Furthermore, if we look at the term that we defined as the determinant of 
the 3 x 3 matrix, notice that we can factor it as 


—ceg +bfg + cdh —afh — bdi + aei = a(ei — fh) — b(di — fg) + c(dh — eg) 


or 

—ceg + bfg + cdh — afh — bdi + aei = a(ei — fh) — d(bi — ch) + g(bf — ce) 
The terms in parentheses are minors of the original matrix obtained by 
using the first row and column, respectively. We leave it as an exercise to 


rewrite the LHS in terms of minors using second and third rows and 
columns. It can be done! 


We can now define the determinant of a square n x n matrix. We will 
define the determinant as the expansion along the first row of the matrix 
using the determinants (minors) of one size smaller matrices contingent on 
it being valid for n x n matrices of sizes n = 1, 2, 3. 


Definition 5.1.4. The determinant of an n x n matrix A, denoted det (A), is 
the scalar given by the formula 


det(A) = X (-1)'7 A 5M; 
(5.4) pea 


Definition 5.1.5. The cofactor Ci, of Aij is defined to be Ci, = CD Mij 


Using the two minors calculated previously, notice that C1,1 = ey . 
(8) =-8, and C23 = 1)" - 3 =-3. 


Definition 5.1.6. The cofactor matrix C of the same square size as A 
consists of the cofactors of A: 


REES tJ Af. . 
6.5) Cis = (“1 Mis 


199 


Closely related to the cofactor matrix is the adjoint matrix, which we 
define next. 


Definition 5.1.7. The adjoint of a square matrix A, denoted adj(A), is the 
transpose of its cofactor matrix C: 


(5,6) 24i(A) = Gt 


Example 5.1.4. We have already computed two entries of the cofactor matrix of 


1 —2 2 
A=| 4 5 l 
0 3 -l 


given in (5.2), we leave it as an exercise for you to find the rest. The following is the 
cofactor matrix of A: 


-8 4 12 
C= 4-1 -3 
-12 7T B 


The adjoint matrix is simple to calculate once we have the cofactor matrix. 
With the definitions of the cofactor and adjoint matrices, we now have 
enough information to define the inverse of a square matrix. 


Definition 5.1.8. Given a matrix A €e R”, the inverse matrix, denoted 
A, is defined by the following formula, where C is the cofactor matrix of 
A: 


(5.7) 


If det(A) = 0, then the matrix A has no inverse, and is said to be singular; 
otherwise we call the matrix A nonsingular, or invertible. 


The formula given above is referred to as the adjoint formula. We now 
introduce some properties of the matrix inverse and matrix transpose in 


200 


the following table. The formulas given hold for arbitrary matrices A and 
B, and scalar c, assuming the standard matrix operations of addition and 
multiplication can be performed on A and B. Notice that there is no 
formula to relate (4 + By! to the sum of inverses. 


Matrix Transpose and Inverse Properties 


Transpose | (A+B)? = AT + BT 
(AT)? =A 
(cA)? = cAT 
(AB)T = BTAT 
Inverse (AB)-! = B-'A7 
(A-!)-1 = A 
(cA)-! = 147! 
AA! = In 
ATA = Ín 
Combined | (AT)! = (A7?)? 


Example 5.1.5. Putting all of the previous definitions together, we can continue our work 
on the inverse of the matrix A from (5.2). The adjoint of A is given by 


—8 4 —12 
adj(A) = 4 -i 7 
12 -3 13 


To compute a! using the adjoint formula (5.7), we need one last piece of information, 
det(A). We have previously defined this scalar for arbitrary 3 x 3 matrices, and applying 
the formula to the matrix A yields det(A) = 8. Now we have enough information to 
compute the inverse of A: 


_j 4. 3 

,[-8 4 -12 : ae 

=l = 1 1 7 
oe) 2. UF S| le i 
2 8 8 


201 


We can verify through matrix multiplication that we really have found the inverse to A. 
Remember that we should end up with AA! = = 414: 


4 1 _3 
1-2 2 2 2 1 0 
4 5 1 1-1 Z/=/01 0 
0 3 —1 3 3 13 0 1 
2 8 8 
E E t 
2 2 1 -2 2 1 0 0 
T 
} -į a 4 5 pL Sie a |. 
3 _3 B 0 3 -1 00 1 
2 8 8 


Example 5.1.6. Now, let us use the inverse of a square matrix to solve a square linear 
system. We wish to solve the system 


5a — Ty +2z—w=-3 
9r + 3y — 5z + 8w = —11 
—62+y—-z+7w=0 
z—4y—3z+5w=6 


ll 


(5.8) 
for the unique solution X = AB: 
Solve[{5x — 7 y + 2z — w == —3,9 x + 3 y — 5z + 8 w == —11, 
—6x+y— z+7w==0,x— 4y-— 3z +5 w == 6}, {x, y, z, w}] 
{k-55 ep SHO: yy 20k w-ao}h 

art ao" 837 837 


(A = {{5, —7, 2, —1}, {9, 3, —5, 8}, {—6, 1, -1, 7}, {1, —4, -3, 
5}}) // MatrixForm 


5 -7 2 =1 
9 3 -5 8 
-6 1 -1 7 
1 =4 -3 5 


202 


(AInv = Inverse[A]) // MatrixForm 


157 19 58 —_ 161 
2511 279 2511 2511 
-e 5 19 — 106 

837 33 837 837 

592 13 533  _ 815 

2511 279 2511 2511 

235 16 377 _ 209 

2511 279 2511 2511 

(B = {{—3}, {—11}, {0}, {6}}) // MatrixForm 

—3 
—11 

0 

6 


AInv.B // MatrixForm 


1000 -3 
0100 -75 
0010 -%3 
0001 -~H 


All three methods (Solve, Inverse, and RowReduce) used above to solve 
this system agree on the final answer. One may wonder what the 
advantage would be to solving a system of equations using matrix 
inverses, as opposed to performing Gauss-Jordan elimination. Consider 
the situation in which you were asked to solve two linear systems of 
equations, which in matrix form can be written as AX = B1 and AX = B2 


203 


where the matrix A on the left side of each equation is the same. The 
solution to the two systems are given, respectively, by 


X=A“*B,, XAB 


So if we compute A! once, we can solve two separate linear systems by 
performing simple matrix multiplication. 


Example 5.1.7. Consider the following system of equations: 
bz — 7y+2z—-—w=3 
9r + 3y — 5z + 8w = 11 
—6z +y- z+7w=6 
x — åy — 3z + 5w = —1 


Il 


(5.9) 


Notice that the coefficients on the LHS of systems (5.8) and (5.9) are the same. Thus in 
matrix form AX = B, the matrix A will be the same: 


Solve[{5x — 7 y + 2 z — w == 3,9 x + 3 y — 5z + 8 w == 11, -6 


x+y- z +7 w == 6,x — 4y- 3z +5 w == —1}, {x, y, z, w}] 
{{x n 2165 PE 604 E 7076 RE: a} 
2511°* 837’ 2511" 2511 


(BHat = {{3}, {11}, {6}, {—1}}) // MatrixForm 
—3 
=l 
0 
6 


AInv.BHat // MatrixForm 


204 


RowReduce[Join[A, BHat, 2]] // MatrixForm 


One last interesting comment before we prove the matrix inverse 
properties tabled earlier in this section. In the case of the 3 x 3 matrix A as 
defined previous, the determinant can be found by augmenting to A its 
first two columns and then summing the three products down the 
diagonals from upper left to lower right followed by subtracting the three 
products up the three diagonals from lower left to upper right. 
Unfortunately, this method does not generalize to larger matrices. 


A = {{a, b, c}, {d, e, f}, {g, h, i}}; 
Join{A, A[[All, 1 ;; 2]], 2] // MatrixForm 


a bc va D 

d e fde 

e W i: p 
Det[A] 


—ceg+b f g+c d h-a f h-b d i+a ei 


Theorem 5.1.1. Let A and B be two square n x n matrices. Then, we have 
the following useful and interesting matrix inverse properties. 

(a) The inverse matrix Ais unique, if it exists. 

(b) (4B)! = BLA! if both A! and B! exist. 

(© (4D! = (4T if A! exists. 
Proof, (a) Assume that A has two different inverses K and L, so K — L #0. 


From Definition 5.1.1, we must have both KA = AK = In and LA = AL = 
In. Then KA = LA or (K — L)A = 0. If we multiply this last equation on the 


205 


right by L, we get (K — L)AL = 0 L, or (K — L)In = 0. This last equation is, 
of course, the same as K — L = 0, which is a contradiction to K — L # 0, and 
so our assumption is false, which forces K = L and the inverse to A is 
unique. 


(b) Let both A`! and B! exist. Then we have 
(AB (B= A" SA (BB JA? 
ALA 


AA 
In 


II 


Il 


Similarly, we also have 


(B~! A7?) (AB) = B(AA™) B~? 
=B B 
=B B 
= fn 
So, by Definition 5.1.1 and part (a) of this theorem, (AB) | = 
B'A! 
(c) Let A`! exist. Then using the property of the transpose that (C 


Bye = DICT, for any two matrices C and D, where CD exists, we 
have 


COE 


Therefore, we have (4 TAT = In by taking the transpose of the result 


T 
given above and using the facts that (cP T= Cand i =f, n, In a very 
similar fashion, we can show that AT (Ay = In. So, by Definition 5.1.1 
and part (a) of this theorem, ay = (4) : 


206 


Homework Problems 


1. Compute the transpose of the following matrices 


-2 
wf a E | | JÉ E al 
2 -4 “ye 21 9 
(a)| -1 0 )| 012 (f)| -4 2 8 
5 ES e : -1 4 -5 
0001 
0 —4 -8 —l 0 1 2 
ofi E | EEE (i) EEr 
8 5 0 1 2 TEF 


2. A matrix is symmetric if AT = A. Which of the matrices from 
problem 1 are symmetric? 


3. A matrix is antisymmetric if AT = —A. Which of the matrices 
from problem 1 are antisymmetric? 


4. Compute the determinants of the following matrices: 


m3 3] w] of io] 


‘ aa a B 
(g)| 9 -4 0 (h) | -4 01 
= 52 5 -2 4 


5. Compute the cofactor matrix to each of the matrices from 
problem 4. 


6. Compute the inverse matrix to each of the matrices from problem 
4, using the cofactor matrices from problem 5. 


7. Use your answers (if possible) to problem 6 to help solve the 
following systems: 


207 


(a) 22 — 3y = 6 (b) 22 — 2y =7 (c) 22 — 3y = —1 


8r — 4y=4 5r+y=8 8r -4y = 3 
(d) 2x — 2y = 6 (e) -3r +5y+2z=1 (f) 22-3y+82 =: 
5r +y = -5 z—-3y+z=1 454-7 Sh 
97 —4y = 4 ba — 2y+4z=6 


(g) —3r+5y+2z=3 (h) 5a+4y+5z=-1 
z—3y+z= z—3y—5z=2 
9x — 4y = —2 —x+3y+5z=0 
8. Determine values of à such that the following matrices are not 
invertible. The values of à that make each of the following matrices 
singular are called eigenvalues. In general, eigenvalues are found by 
solving for A the equation det (4 — À In) = 0, for A € R”: 


P ¢ —À 3 4 
@)| 2) eal Aa E ol 4 —-4-X +8 
s 6 -9 -10 -À 
9. A matrix A is diagonal if Aij = 0 for i + j. Entries on the diagonal 
are not required to be nonzero, however, for this problem, assume 
that Aii #0 for 1 <i <n. Show that the inverse matrix, Ary, toAisa 
diagonal matrix with entries which satisfy the following relation: 


i. 
Aji 


10. A matrix A is upper triangular if Aij = 0 for i > j, and is lower 
triangular of Ajj = 0 for i < j. Is the inverse of a lower/upper 
triangular matrix D also a lower/upper triangular matrix? 


11. Compute the inverses of the following matrices: 


3 0 0 -2 0 0 š$ 00 
(a) | 0 -4 0 (b) 06 0 (c)] 0 -4 0 
0 oO -1 00 9 0 oO 2 


12. A matrix A is orthogonal if its transpose is equal to its inverse, 
that is, yee he Explain why symmetric, antisymmetric, and 
orthogonal matrices must be square. 

13. Let A be a square matrix. Show that A + A’ is symmetric while 
A-A’ is antisymmetric. 


208 


14. Let A be a square matrix. Show that A can be written as the sum 
of a symmetric matrix and an antisymmetric matrix. 


15. Let A be any matrix. Show that both AA’ and ATA are 
symmetric matrices. 

16. Explain why (4B)! = BTAT. 

17. Let n be any positive integer and A be any invertible square 
matrix. Show that (A”)! = (471)". 

18. Let E be an elementary matrix. Does E always have an inverse, 
and if so, is Æ also an elementary matrix? 


Mathematica Problems 


1. Solve the following systems by first converting them to the form 
AX = B, and then computing X = A! B: 


(a) 5r + 5y = 1 (b) 5r + 5y = 7 
22 + 3y=7 2r + 3y=1 
(c) — 4x + 3y = —1 (d) — 4z + 3y = —13 
Tx — 9y = —13 Tx — 9y = —1 
(e) 5r +2y +3z=0 (£) 52 + 2y + 3z = 
år- Ty-z=4 år — Ty- z = —4 
3z + 2z = —1 3r +2z=2 
(g) -4z —5y+7z=1 (h) -4x — 5y t 7z =-—] 
Tz +5y-3z=1 Tx + 5y — 3z = —1 
2x + 6y + 8z = —2 2x + 6y + 8z =2 


(i) 2w + 32+ 7y—3z=1 (j) 2w + 32 + 7y —3z=2 
8w — 2x + 5y + 8z = —2 8w — 2x + 5y + 8z = —4 
-3w — 2r —y+7z=1 —42 — 5y + 7z = 2 
l3w — 4z —- 5y +2 = 1 13w — 4r — 5y + 2 = 2 


2. Find the values of à so that the following matrices are singular: 


209 


0 Q =X f å 
, ee Sank, 0 
(c)| 1 5A 0 (d) L £22 SH 
5 4e A A i alea 
pai 1 4 ay A e a 
Oo -2-A § 1 L. oat =k, Sf 
(e) 9 o IA 3 (£) b iw YF 
0 0 0 2-d i et J 


3. Compute the determinant of the matrix 


5 1 2-9 8 
0 l 0 1 0 
A=]/]1 -l 0 E al 
1 0 E Pa 
3 l -6 -2 4 


using expansions along the first row for all successive determinants 
of the minors. Check that your work is correct. 

4. Compute the adjoint of the matrix A from problem 3. 

5. Compute the inverse of the matrix A from problem 3. 

6. For the two general 2 x 2 matrices 


s[i thea 4] 


show that det(AB) = det(A)det(B). 

7. Prove that det(AB) = det(A) det(B) for two arbitrary 3 x 3 
matrices. 

8. Using the generalization of the previous two problems, explain 
why C4p = C4Cg,where these three matrices are the cofactor 
matrices of the square n x n matrices A,B and AB. Do not use 
Mathematica for this problem. 


9. Find the determinant and inverse for the matrix 


210 


verifying your results in as many ways as possible. 
10. Using the matrix A from problem 9, solve in as many ways as 
possible the matrix equation AX = B, where 


11. Using the matrix A from problem 9, verify homework problem 
17 forn=5. 


5.2 Finding Determinants by 
Expanding along Any Row 
or Column 


We spent quite a bit of time in the last section talking about determinants 
while trying to find inverses of square matrices. We now focus our 
attention solely on determinants. In this section, we will find a general 
method for calculating the determinant of any square n x n matrix A based 
on the adjoint formula for its inverse A. 


Remember that the adjoint formula is given by (5.7), which we rewrite 
here for completeness: 


le 


(5.10) — det(A) 


CT 


211 


Here C is the cofactor matrix of A, and C’ is the transpose matrix of C, 
which means that the columns of C7 are actually the rows of C. Now, if 
we multiply the cofactor formula by A on the left, we have 


1 ra 
ye ns | ee 
an det( A) f 


which means that 
ACT = det( A)In 


The n x n matrix D = det(A)Jn is simply the diagonal matrix whose 
diagonal entries are all the scalar value det(A): 


det(A) wae y 


Dm dEn det(A) 


eats 0 wA det(A) 


This fact now gives us n different ways to compute the determinant of A, 
as we shall now see. 


First, we notice that D has diagonal entries Dj; = det(A) for 1 < i <n. 
From above, we also know that D = AC T So for i= 1, we have 


212 


det(A) = Dia = 2 ACh 
j=1 


= (first row of A)(first column of C7) 
= (first row of A)(first row of C) 
= Ay iCia + Ai2Ci2 +++ + AinCin 


Í 
iM 
ca 
Ò 


Notice that the last expression is simply the sum of the products of the 
first row entries of A times their respective signed minors. This method of 
computing the determinant of A is called expanding along its first row. 
Since det(A) = D; i for any 1 <i <n., we can expand along any row of A to 
find its determinant. If we expand along the ith row of A, very little 
changes: 


n 
det(A) = X (-1) H A; jMi yj 
(5.12) g= 
In order to efficiently use formula (5.12) to compute the determinant of A, 
we should find det(A) by expanding along the row that contains the largest 
number of zero entries. Remember that D = det(A)/n has off-diagonal 


entries Dj = 0 for all 1 <i#j <n. Then, for example, with i = 1 and j = 2, 
we have 


213 


n 
X T 
Dj 2 = A, jCj2 
j=l 


= (first row of A)(second column of C7) 
= (first row of A)(second row of C) 
= Aj 1021 + A1,2C2,2 +--+ + AinCan 


= > A, jC2,; =0 
=1 


This formula tells us that the sum of the products of the first-row entries of 
A times the corresponding cofactors for the second row of A will result in 
a value of 0: 

n 
So(-1)?47 Aı j det(matrix A with second row and jth column removed) = 0 
j=l 


This generalizes to any off-diagonal location with 


So (-1)* 7 Ais det(matrix A with kth row and jth column removed) = 0 
j=l 


for 1<i # k <n. However, note that the RHS of the last equation above is 
0, not det(A). Therefore, we cannot derive the value of det (A) from it. 


In a similar fashion, we can start with the adjoint formula and multiply 
both sides by A on the right instead of the left. This gives 


1 
—14— T 
~ det(A) e 


which can be rewritten as 
CTA = det(A)I, 


We once again define D = det(A)/n as before. Then for i = 1, we have 


214 


det(A) = Dia = X CT Aj, 
j=l 
= (the first row of C’)(the first column of A) 
= (the first column of C)(the first column of A) 
=C)1Ai1 + C2 Å2,1 i CarAn1 


= 9 Öjan; 
q=1 


This is the sum of the products of the first-column entries of A times their 
respective cofactors. This method of computing the determinant of A is 
called expanding along its first column. Similarly to using rows, you can 
now expand along any column of A to find its determinant, not just the 
first column. If we expand along the jth column of A, then 


n 


det(A) = }0(-))' Ai Mi 
(5.13) i=] 


You should compare this definition of det (4) to the definition given in 
(5.12), and notice that only the index has changed. Again, examining 
off-diagonal entries, we have 


n 
—1)'** A; ; det(matrix A with ith row and kth column removed) = 0 
j 


i=l 


For 1 <j #4 <n. This method for finding the determinant of a square n x 
n matrix A is practical only for small values of n or if A is a sparse matrix, 
meaning that it has a very large number of zero entries. These two 
expansion methods also tell us that if the matrix A has a row or column of 
all zero entries, then its determinant is zero and A has no inverse. So now 
we have 2n ways of computing the determinant of an n x n matrix, none of 
which involve computing an inverse. 


Example 5.2.1. Let us find the determinant of the matrix 


215 


-9 5 0 2 


4 -l 3 7 
Be 6 -2 oO =8 
1 10 -4 11 


by expanding along the third column since it has two zeros in it, and also along the first 
row. First, we expand along the third column: 


4-1 7 
det(A) =(—1)'*3 . 0 - det 6 -2 -8 
1 10 11 


-9 5 2 
+ (—1)?*9 .3 -det 6 -2 -8 
i 10 {i 
-9 5 2 
+ (—1)$+3 . 0 -det 4 -1 7 
1 10 11 
-9 5 2 
+ (—1)**3 . (—4) - det 4-1 7 
6 -2 -8 


Notice that two of the terms are zero; hence we have 


-9 5 2 
det(A) =(—1)?*5 - 3 - det 6 -2 -8 
1 a0 li 


-9 5 2 
+ (—1)**% . (—4) - det 4-1 7 
6 —2 -8 


In a similar fashion, we compute the determinant by expanding across the first row as 
follows: 


216 


det(A) =(—1)!*! . (—9) - det —2 0 -8 
10 -4 1l 
4 3 7 
+ (—1)!+? . 5- det 6 0 -8 
1-4 11 
4-1 3 
+(—1)'*4.(2)-det| | 6 -2 0 
1 10 —4 


Here, we did not bother to include the term corresponding to the 0 entry in row 1, column 
3. In both expansions, we still have to compute determinants of 3 x 3 matrices to find the 
determinant of A. Instead of finishing this process by hand, we will have Mathematica do 
it for us. Although Mathematica does have a command called Minor, it can be rather 
confusing, so we will use the Drop command to remove the required rows and columns, 
and then take determinants of the resulting matrices. We first do an example of the 
utilizing the Drop command, followed by Det: 


(A = {{—9, 5, 0, 2}, {4, —1, 3, 7}, {6, —2, 0, —8}, {1, 10, —4, 11}}) 
// MatrixForm 


“o> 5 @ ‘2 
A-71 I T 
6 -2 0 -8 
1 10 -4 il 


Drop[A, {2}, {3}] // MatrixForm 
-9 5 2 


6 -2 -8 
C W H 
Det[%] 
—768 


In the following Mathematica commands, we will compute the determinant of A by 
expansion along the first row and then also the third column, using the Det and Drop 
commands: 


Sum|[(—1)'*A[[i, 3]] Det{Drop[A, {i}, {3}]], {i, 1, 4} 
2976 


217 


Sum[(—1)**/A/[1, j]] Det([Drop[A, {1}, {i}]], {i, 1, 4}] 
2976 
Det[A] 
2976 
Sum[(—1)*tiA[[2, j]] Det[Drop[A, {3}, {i}]], {i, 1, 4}] 
0 
Sum|(—1)'+SA[[i, 1]] Det[Drop[A, {i}, {3}]], {i 1, 4}] 
0 


Example 5.2.2. Let us use the expansion method to check our formula for the 
determinant of a 3 x 3 matrix. Let 


a be 
A=|jadae f 
g hk 


for constants a through k. We will find the 3 x 3 determinant formula by expanding along 
the first column of A and then also along the third row, verifying that they give the same 
result: 


A = {{a, b, c}, {d, e, f}, {g, h, k}}; 
Sum|(—1)'**A[[i, 1]] Det{[Drop[A, {i}, {1}]], {i, 1, 3}] 


(—c e+b f) g—d(—c h+b k)+a(-f h+e k) 
Simplify (%] 
—cegt+bfg+cdh-afh—bdk+aek 
Det{A] 

—c e g+b f g+c d h-a f h-b d k+a ek 


Sum[(—1)°+iA[[3, j]] Det[Drop[A, {3}, {i}]], {i, 1, 3}] 
(—c e+b f)g-(—c d+a f)h+(—b d+a e)k 


218 


Simplify [%] 
—c e g+b f g+c d h-a f h-b d k+a e k 


It is important to remember that taking determinants requires many 
computations. To compute the determinant of a 3 x 3 matrix, three 2 x 2 
matrix determinants must be found. Similarly, for a 4 x 4 matrix, four 3 x 
3 determinants must be found, but for each of the four 3x3 matrices, three 
2 x 2 determinants must be computed. This would entail computing 12 
determinants total for a 4 x 4 matrix, assuming that there were no 
zero-valued entries. Clearly, if there are any zero entries in the matrix, one 
should attempt to compute the determinant along a row or column in 
which the zero entry resides. 


We end this section with a very important theorem relating determinants 
to the number of solutions to a square linear system. 


Theorem 5.2.1. Given the system AX = B, where A e R”, if det(A) #0, 
then there exists a unique solution, X = A'B, to the system. As a 
consequence, if the system has either no solution or an infinite number of 
solutions, then det(A) = 0. 


Proof Clearly, if det(A) # 0, then an inverse exists, due to the adjoint 
formula 


4-1! l cT 


et(A) 


d = 


Therefore, one can perform the multiplication AB, which is the solution 
to the system. We will prove a more general form of this theorem in 
Section 5.5. 


Example 5.2.3. Consider the following two systems: 
(a) 27—4y=4 (b) 2r—4y=4 
—4r+8y = —8 —4y + 8y = —5 


219 


Notice that system (a) has an infinite number of solutions, since any solution can be 
expressed in one of the following two forms, depending on which variable you wish to 
solve for: 


z |_| 2y+2 T v 
= A = 
y y y iz—1 
2 
On the other hand, system (b) has no solution at all, even though the coefficient matrix is 


the same as that from system (a). The difference is that in system (a), the second equation 
is simply a multiple of the first, which is not the same for system (b). 


Finally, we give a table of four important properties of determinants for 
square n x n matrices, and after the table, we show that the first property is 
valid for arbitrary 2 x 2 matrices with the help of Mathematica. 


Matrix Determinant Properties 


det(A-?) = Ra 


det(AT) = det(A) 


det( AB) = det(A) det(B) 
det(cA) = c” det(A) 


If A has a row or column of all zeros, then det(A) = 0. 
If two rows (or columns) of A are identical, then det(A) = 0. 


A = {{a, b}, {c, d}}; 
B = {{e, f}, {g, h}}; 
Det[A.B] 


b cf g-a d f g-b ce h+adeh 
Expand[Det/A] Det[B]] 


befg-adfg-bceh+tadeh 


220 


Homework Problems 


1. Compute the determinants of the following matrices by 
expanding along the first row: 


1 -1 1 2 -2 2 b 2: £ 4 
(a)} -1 -1 0 (b)| -2 -2 0 (c) ii D 
1 00 2 00 of j eG 
8 ygi 1 2 3 4 3: iB =i i 
1 1 62 5 6 7 8 0-1 67 
a| > +00] ©! 9 0 1 12!] lo 0 48 
= jy dy il 13 14 15 16 0 0 01 


2. Compute the determinants of the matrices from problem 1 by 
expanding along the second column. 


3. For each of the matrices in problem 1, which row or column 
would be the best choice to expand on in computing the 
determinant? 


4. Compute the determinants of the matrices from problem 1 using 
the row or column that you found in problem 3. 


5. Which of the matrices from problem | are singular? 


6. Prove that the determinant of any upper triangular matrix U or 
lower triangular matrix L is simply the product of the diagonal 
elements: If 


U;; =Oforl<j<i<nandLl,; =Oforl<i<j<n 


then 


det(U) = Il Uji and det(L) — ieee 
i=l 


i=l 


7. Suppose that a matrix A can be written as A = LU, where L is 
lower triangular and U is upper triangular, specifically given as 


221 


a be 
A=/ig 1 0 0 de 
0 0 of 


What is det(A)? This process of decomposing a matrix into the 
product of an upper and lower triangular matrix is known as LU 
factorization. 

8. Find values of à such that the following systems of equations 
have a non-trivial (nonzero) solution: 


(a) —2a+(4-—A)y=0 (b) (—3 — A)a + 5y = 0 
(6—A)x —4y =0 7z +(—-1-—A)y=0 

(c) —6r2+(5—A)y=0 (d) (2—A)x + 8y =0 
(12—A)r+y=0 62 + (4—A)y=0 


9. For each system from problem 8, find the corresponding 
nontrivial solutions for each value of À found. 


10. Let c be a scalar and A be a k x I matrix. Explain why cA = 
DcA, where De is the k x k diagonal matrix with each diagonal 
entry equal to c. 


11. Use your argument from problem 10 to show that det(cAd) = 
c"det(A) assuming A to be n x n matrix and c any scalar. 


=l) — l 
12. Show that F ) = da0), 


13. Explain why det(45 = det(A). 
14. Let A be an n x n matrix with det (A) + 0. 


(a) Find a formula for det(C) in terms of n and det (4), where C 
is the cofactor matrix of A. 


(b) Find a formula for C! in terms of A and n if C is A’s 
cofactor matrix. 


15. Compute the determinants of each of the three types of n x n 
elementary matrices. 


16. Let P(xo,vo) and Q(x1,y1) be two distinct points of R? 
(a) Show that the line through these two points has the equation 


222 


ry yo 1 
det Hm th 1 =(0 
re ig A 


(b) Use the formula in part (a) to find the equation of the line 
through the two points P(—7,4) and Q(9, —5). 
17. Let P(xo,vo,z0), O(x1,¥1,21), and R(x2,y2,z2) be three 
noncolhnear points of R. 
(a) Show that the plane through these three points has the 


equation 
ro yo žo 1 
det “1 be te d =0 
T2 yo z 1 
y rt Ss | 


(b) Use the formula in part (a) to find the equation of the plane 

through the three points P(—7,4, 2), O(9, —5,8), and R(6,11, -3). 
18. Let P(x0,vo), O(x1,y1)j and R(x2,v2) be three noncollinear points 
of R? 

(a) Show that the circle through these three points has the 

equation 


r +u to yo 1 


tity, ti y 1 
det E . = 
T5 +Y T2 yo | 


z?+y? x y 1 
(b) Use the formula in part (a) to find the equation of the circle 
through the three points P(—7,4),O(9, —5), and R(6,11). 
19. Can you revise problem 18 in order to find a determinant 
equation for a general conic section passing through a certain 


number of points in the xy-plane? If yes, then test your formula on a 
set of points. If no, then explain why this is impossible. 


223 


20. Can you revise problem 18 in order to find a determinant 
equation for a sphere passing through a certain number of points in 
space? If yes, then test your formula on a set of points. If no, then 
explain why this is impossible. 

21. Verify that any 2 x 2 or 3 x 3 matrix A that has two identical 
rows (or columns) must have det (A) = 0. 


Mathematica Problems 


1. Use the Drop and Sum commands to compute the following 


determinants: 
wt 1 3 l =} 4 3 
oF Se a 9 <]F <4. 9 
@)) i o 05 i 9 6 35 
=$ -1 4 io =f —i D 
7-13 8 -5 Soe 2 ks 
“1G A if =3 1-1 628 
(c) i > (d) $- T sf oO 1 
-12 S i ; 
_7 7 2 j -9 $ AF A 
6. -5 -§ 2 A 
t 28 2 mg 3 06 01 
i $ S&S > 28 G2 73 
leh i 2 Er 3 @ (£) G32 146 
p & ţi d A 1 0 =3 4 
6a —§ & o -6 00 36 


2. Use the Det command to compute the determinants from problem 
l. 


3. Compute the cofactor matrix for each of the matrices in problem 
l. 


4. Compute the adjoint matrix for each of the matrices in problem 1. 


5. Compute the inverse matrix of each invertible matrix from 
problem 1. 


6. Find values of à such that the following systems of equations 
have a non-trivial (nonzero) solution: 


224 


(a) —22+(—4—A)y=0 (b) —32+(-7-A)y=0 
(2—A)z+4y =0 (1-—A)x+4y =0 


(c) — 3z —3y+(—2—A)z=0 (d) 2z + (—1—A)y — 2z = 
(2-A)a+y+z= (l-A)ja-—y+2z=0 
22 + (3—A)y+2z=0 2z — 2y+(2—A)z=0 


7. For each system from problem 6, find the corresponding 
nontrivial solutions for each value of À found. 


1 
8. Using some of the 5 x 5 matrices in problem 1, and with c = —3, 
verify that the four determinant properties given in this section are 
valid. 


9. In as many ways as possible, solve the system of linear equations 
with the augmented matrix 


5 -l 3 9 0 -7 -9 
—4 2 7 1 5 0 2 
0 —8 2 -5 1 4 -5 
1 1 1 2 3 1 7 
—2 ll —4 1 2 5 0 
4 0 1 -13 1 9 1 


10. Do homework problem 16 (b), plotting the points and line 
together. 


11. Do homework problem 17 (b), plotting the points and plane 
together. 
12. Do homework problem 18 (b), plotting the points and circle 
together. 
13. Do homework problem 19, plotting the points and conic 
together. 
14. Do homework problem 20 (if you can find a formula), plotting 
the points and sphere together. 
15. (a) For two arbitrary 3 x 3 matrices A and B, verify that det(4B) 
= det(4)det(B). 
(b) Repeat this verification for two “randomly chosen” 4 x 4 
matrices A and B with complex entries. 


225 


16. Verify that any 4 x 4 matrix A that has two identical rows (or 
columns) must have det(A) = 0. 


5.3 Determinants Found by 
Triangularizing Matrices 


In the previous section, we saw how to compute determinants by 
expanding along any row or column. This expansion method allows us to 
see that if a matrix has a row or column with only one nonzero entry, then 
computing its determinant is significantly easier. We begin this section by 
introducing some terminology that you may have noticed from the 
problems at the end of Sections 5.1 and 5.2. 


Definition 5.3.1. A square matrix U is called upper triangular if, below its 
main diagonal, all of its entries are zero (i.e., Uj, = 0 for all i > j). 


The following are all examples of upper triangular matrices: 


p 73 2 8 9 

1-2 2 | ; E- : 0 2 8 7 -6 
3 4 Saa ae di 0 0-2 -2 2 

0 0 -2 Sk i 0 0 0 1 4 


0 0 0 0 -2 
Definition 5.3.2. A square matrix L is called lower triangular if, above its 
main diagonal, all of its entries are zero (i.e., Lij = 0 for all i < j). 


The following are all examples of lower triangular matrices, which are 
simply the transposed matrices of the upper triangular examples: 


> oO wpm a 

10 0 a oe Ra = 2 & 0 0 
=3 33 6 EEP 2 8 ÆT O 
9 4: =2 s a oe S$ F 2 7. 
es 9 = 24 -2 


226 


Definition 5.3.3. A square matrix T is called triangular if it is either upper 
or lower triangular. 


If we expand along the first column of an upper triangular matrix U and 
along first columns of the successive minors, then its determinant is the 
product of its main diagonal entries. This was left for you to prove in 
homework problem 6 of Section 5.2. If we expand along the first row of a 
lower triangular matrix Z and along first rows of the successive minors, 
then its determinant is also the product of its main diagonal entries. The 
result of these two statements is the following theorem. 


Theorem 5.3.1. Zf T is an n X n triangular matrix, then its determinant is 
the product of its main diagonal entries: 


det(T) = II Tis 


(5.14) on 


Let us have Mathematica check this for us in the general 3 x 3 case, as 
well as in a 6 x 6 case with numerical entries: 


(U = {{a, b, c}, {0, d, e}, {0, 0, f}})//MatrixForm 


a be 
0 de 
0 O£ 


Sum |(—1)'+*U[[i, 1]] Det{[Drop[U, {i}, {1}]], {i, 1, 3}] 
adf 


(L = {{a, 0, 0}, {b, c, O}, {d, e, f}}) // MatrixForm 


227 


Sum|(—1)"*/L{[1, j]] Det{Drop[L, {1}, {j}]], {i, 1, 3} 
acf 
(T = {{-7, 2, 1, —5, 3, 8}, {0, 4, —1, 6, 2, 1}, {0, 0, 9, —1, 7, 


2}, {0, 0, 0, —4, 10, 1}, {0, 0, 0, 0, 6, —7}, {0, 0, 0, 0, 0, 11}}) // 
MatrixForm 


-7 2 1 -5 3 8 
O 4 -1 6 2 1 
0 0 9 =-1 7 2 
0 0 0 -4 10 1 
E 0; 0 0 6 -7 


trege 8 H 


Product[T[{i, i]], {i, 1, 6}] 
66528 


Det[T] 
66528 


If we can quickly transform a general square matrix A into a new 
triangular matrix T without changing the value of its determinant, then we 
would have an easy method of computing its determinant. The way to 
change a general square matrix into a triangular matrix is through the three 
elementary row operations and their corresponding matrix multiplication 
counterparts. Recall that these three row operations are (1) swap two rows, 
(2) multiply a row by a nonzero number, and finally (3) multiply a row by 
a nonzero number and add it to another row. Let us see how these three 
row operations affect the determinant of a square matrix by examining 
some examples. 


Example 5.3.1. We first focus on the elementary operation of swapping rows: 


228 


(A = {{a, b, c}, {d, e, f}, {g, h, i} }) // MatrixForm 


a be 
def 
pHni 


Det[A] 
—c e g+b f g+c d h-a f h-b d i+a e i 


(ASwapRow1Row3 = {A[[3]], A[[2]], A[[1]]}) // MatrixForm 


Det{ASwapRow1Row3] 
c e g—b f g—c d h+a f h+b d i-a e i 


(ASwapRow2Row3 = {A[[1]], A[[3]], A[[2]]}) // MatrixForm 


a be 
g hi 
def 


Det(ASwapRow2Row3] 

ceg-bfg—cd h+a f h+b d i-aei 
The swap row operation has changed the sign of the determinant of A. If 
we do an even number of row swaps on A, then its determinant is 


unchanged, while if we do an odd number, then its determinant changes 
sign. 


Example 5.3.2. Next up is the operation of multiplying a row of 
the matrix A by a scalar n: 


229 


(AMulRow2byn = {A[[(1]], n A[[2]], A[[3]]}) // MatrixForm 


a b č 
dn en fn 
g h i 
Det[AMulRow2byn] 
—c¢ e g n+b f g n+c d h n-a f h n-b d i nta ein 


Factor [%] 


—(ceg—bf g—cd h+a f h+b d i—a e i)n 


From this example, we see that if we multiply a row of A by a nonzero 
number n, then the determinant of the new matrix is n times the 
determinant of A. In general, the easiest way to see this is to expand along 
the row that was multiplied by n, and then notice that a common factor of 
n can be pulled out of every term. 


Example 5.3.3. Now, we focus on the last elementary row operation. 


(AAddnTimesRow1ToRow2 = {A[[1]], A[[2]] + n A[[1]], A[[3}]}) // 
MatrixForm 


a b c 
d+an e+bn f+cn 
g h i 


Det[A AddnTimesRow1ToRow2 | 
—c e g+b f g+c d h—a f h-b d i+a ei 
% — Det[A] 


0 


230 


The final row operation, multiplying a row by a nonzero number n and adding it to 
another row, has no effect on the determinant of the original matrix A. If you expand in 
the new matrix along the row that has been added to, then you can see (using the 
expansion results of the Section 5.2) that the new matrix has the same determinant as the 
original matrix A. 


Determinant Properties with Elementary Matrices 


Elementary Matrix of Type I det(E, A) = —det(A) 


(swapping of two rows) 


Elementary Matrix of Type II det (E2A) = k det(A) 


(multiply a row by k) 
Elementary Matrix of Type III det(£3A) = det(A) 
(add k times a row to another row) 


It is time to put this information together to find a way to compute general 
determinants for any square matrix A. Our approach will be to use only the 
third row operation of multiplying a row by a nonzero number and adding 
it to another row, to change A into an upper triangular matrix U. Then both 
matrices A and U have the same determinant. We know that the 
determinant of U will be the product of its main diagonal entries. This 
provides a very practical way of computing determinants, even for large 
matrices. 


Example 5.3.4. Now it is time for an example using a 7 x 7 matrix. We wish to make the 
process of converting a matrix to upper triangular a bit more automated. To this end, we 
have included a couple of functions that will allow us to manipulate matrices into the 
desired form. Mathematica has no built-in commands to create strictly upper triangular 
forms (there are some commands that come close, but not quite what we are looking for 
here). The first function, PivotDown, will zero out all entries in a column below a desired 
location. The second function, DetByGaussianElimination, uses PivotDown to convert 
a matrix to upper triangular, and when possible, do it only by type III row operations. Try 
to make sure that you understand how these functions work. 


231 


PivotDown[m_, {i_, j-}, oneflag_: 0] := Block[ 
{k}, 
If{m|[i, j]] == 0, Return[m)); 
Return(Table[Which{k < i, m[[k]], k > i m{[k]] — ml[k, 5]]/mlli, dl] 
m|[z]], k == i && oneflag == 0, m{[k]], k == 1 && oneflag == 
1, m[[k]]/m[[ż j]] ], {k, 1, Length[m)]}}]] 
DetByGaussianElimination[mat_] := Block[ 
{row, column, matrix, m, n, k, t, swaps}, 
row = 1; column = 1; matrix = mat; swaps = 0; 
{m, n} = Dimensions[mat)]; 
While[row < m && column < n, 
Which{ 
matrix[[row, column]] 40, matrix = PivotDown|matrix, {row, 
column }}); 
row ++; column +4; , 
matrix([row, column]] == 0, 
swaps ++; 
If[Union[{ Table[matrix{[k, column]}, {k, row + 1, m}]] == {0}, 
swaps ——; column ++; , 
t = row + Position[Map[#1#0 &, Table[matrix([k, column]], 
{k, row + 1, m}}], True]{[1, 1]]; 
matrix = Table[Which{k == row, matrix|[t]], k == t, matrix|[[row]], 
True, matrix{[k]] ], {k, 1, m}]] ]]; 
If[swaps > 0, Print["There were ", swaps, " row swap(s)"]]; 
Print [MatrixForm[matrix]]; 
Return{matrix]} 


(A = {{-9, 3, 0, 2, —4, 5, 1}, {6, 8, —T, —1, 2, 10, 3}, {2, 5, =l; 
—15, 6, 2, —4}, {11, 7, 5, 4, —8, —1, 6}, {—7, 1, 1, 2, 5, —3, 8}, {1, 
—12, 9, 0, 4, —2, 13}, {0, —9, 7, 2, 6, 3, —4}}) // MatrixForm 


3 0 2 —-4 5 1 
6 8 -7 -1 2 10 3 
5 
T 


-1 -15 6 2 —4 
11 5 4 -8 -1 6 
=7 | 1 2 5 -3 8 
1 -12 9 0 4 -2 13 
0 -9 7 2 6 3 -4 


232 


U = DetByGaussianElimination|[A}; 


-9 3 0 2 —4 5 1 
1 2 40 ll 
0 10 -7 3 -5 = = 
0 0o 89 1327 247 _ 40 __ 527 
30 90 45 9 30 
18169 9410 2554 7454 
0 0 0 267 ~ 267 267 267 
151233 93144 136380 
0 0 0 0 18169 — 18169 “18169 
855411 693124 
0 0 0 0 0 50411 50411 
6837016 
0 0 0 0 0 0 — Saarey 


Det[U] 
61533 144 
Det[A] 


61533 144 


The determinant of the matrix U above is positive, as is the determinant of A. This shows 
that the function DetByGaussianElimination performed an even number of row swaps 
on the matrix A to get U, if any row swaps were performed at all. 


We can also have Mathematica apply the PivotDown function along the 
diagonal of A. This would allow us to carry out the process of the 
DetByGaussianElimination function step by step. PivotDown will 
multiply the ith row by an appropriate value and add it to other rows to 
place zeros in the jth column, aside from the entry about which we are 
pivoting. For our current situation, we only want entries below the 
diagonal in the jth column to be zeroed out. This will allow us to construct 
a triangular matrix instead of a rref matrix, which is all we need to 
compute the determinant. 


233 


(Al = PivotDown[A, {1, 1}]) // MatrixForm 


= F OG 2 = 5 1 
TEETER 
1324 pgr 
0 -9 7 2 6 3 —4 


-9 3 0 2 —4 5 1 
1 2 40 ll 
0 10 7 3 3 + > 
0 0 89 — 1327 247 40 _ 527 
30 90 45 9 90 
0 0 187 274 _ 548 _ 82 149 
15 45 45 9 45 
oo 4 22 3651 _ 46 347 
15 45 45 9 45 
005 nH 8 az a 
6 18 9 9 18 
T 23 27 7 


=<} 2 0 2 —4 5 l 
1 2 40 11 
0 0 & -BZ 247 _40 _ 527 
30 90 45 9 90 
0 0 0 18169 _ 9410 2554 7454 
267 267 267 267 
73 703 446 698 
0 0 0 % 3 8 8 
423 110 1367 1694 
0 0 0 89 89 89 9 
0 0 0 1543 1096 4285 182 
267 267 267 267 


234 


(A4 = PivotDown[A3, {4, 4}]) // MatrixForm 


= ae, 2 —4 5 1 
1 2 40 11 
0 10 -7 } —2 40 u 
0 0o 88 -BI 247 _40 _ 3527 
30 90 45 9 90 
0 0 0 18169 __ 9410 2554 7454 
267 267 267 7 
151233 _ 93144 136380 
0 0 0 0 18169 18169 18169 
0 0 0 0 67180 266929 310396 
18169 18169 18169 
128962 276829 92 
0 0 0 0 18169 18169 18169 


-9 3 0 2 —4 5 1 
a8 1 ma | 40 11 
0 10 7 3 3 3 3 
0 o 8 1327 247 —_ 40 — 527 
30 90 45 9 90 
0 0 0 18169 _ 9410 2554 7454 
267 67 267 267 
51233 93144 136380 
0 0 0 0 18169 18169 18169 
855411 693124 
0 0 0 0 0 50411 50411 
988455 407828 
0 0 0 0 0 50411 — Sll 


(A6 = PivotDown[A5, {6, 6}]) // MatrixForm 


— 2 @ 2 -4 5 1 
: 1 a 40 u 
0 10 -7 } 2 k 1 
0 0 8 -BI 27 à _% __ 527 
30 90 45 9 90 
0 0 0 18169 _ 9410 2554 7454 
267 267 267 267 
51233 93144 136380 
0 0 0 0 18169 ™ T8169 18169 
855411 693124 
0 0 0 0 0 50411 50411 
6837016 
0 0 0 0 0 0 -So 


235 


Det[A6] 

61 533 144 

Product[A6[[k, k]], {k, 1, 7} 
61533 144 

Det[A] 


61533 144 


In doing these six partial pivots about the first six diagonal entries of A, 
we found an upper triangular matrix, 46, whose determinant is the same as 
that of A, since we did not swap any rows during the process. Clearly, 
computing the determinant of A6 is much simpler than computing the 
determinant of A. 


We now know that a square matrix A can be converted to an upper (or 
lower if need be) triangular matrix T through multiplication by elementary 
matrices £1, £2, ..., Er, where E1 is the first row operation applied to A 
and Ær is the last: 


T = E,E,_1--+Egk\,A 


In a similar manner, instead of triangularizing A, we can use row 
operations through their elementary matrices £1, E2, ..., Ek to convert A to 
rref form, with 


rref (A) = E,E,-1-++ EE, A 

Since we have this equation, we can take a determinant of both sides: 
det(rref(A)) = det(E,E,—1-«-E2E,A) 

Using the table of matrix determinant properties in Section 5.2, 
specifically det(4B) = det(AB) det(A) det(B), we know that the 


determinant of a product of square matrices is the product of the 
determinants of the matrices. We now have 


236 


(5.15) 
det(rref(A)) = det( Ep )det(E,_1) - -- det(Ez)det( E1 )det(A) 


Also, as we have just discussed, the determinant of a type III elementary 
matrix is 1. Similarly, the determinant of a type II elementary matrix £E, 
for a nonzero scalar c, is c. Finally, the determinant of a type I elementary 
matrix is —1. If we now combine this information with equation (5.15), we 
have that det(A) = 0 exactly when det(rref(A)) = 0, since det(£;) + 0 for all 
j. As a result of this, we can conclude with the following theorem, since 
for a square matrix A, rref (A) will be the identity matrix unless rref (A) 
contains at least one row of all zeros. (Why?). A row of all zeros in rref 
(A) indicates that at least one row of A is a linear combination of the other 
rows of A. (Why?) 


Theorem 5.3.2. Given a square matrix A, det(A) = 0 if and only if at least 
one row of A is a linear combination of the other rows of A. As well, since 
det(4") = det(A), det(A) = 0 if and only if at least one column of A is a 
linear combination of the other columns of A. 


Homework Problems 


1. Compute the determinants of the following matrices: 


237 


1 0 00 
a oe 
(a)| 0 -1 0 ag) 2 2 Be 
E gi 3 6 10 
TF dy ot 2 
oe ae E 3 0 000 
ae a Tie 1 1 000 
(c) (a| 2 9 700 
0 0 5 7 
a er Be 3 8 44 
6 oF -6 2 1 
Ss: a D 
fe Re 2700 Ó 
(e) 4 H| 0 010 0 
00 -7 3 : 
on eS 0005 -8 
0003 6 


2. Compute the determinants of the following matrices by 
converting each to upper triangular form using only type III 
elementary matrices: 


i os sa 1 1-2 8 1 4 9 5 
9-5 7 8 a ae 
JE 4 r j 3 6 2-3|©|2 -i1 5- 
12-1 -1 1 . 2 Dg 


3. Compute the determinants of the matrices from problem 2 by 
converting each to lower triangular form using only type III 
elementary matrices. 

4. Compute the determinants of the following matrices by 
converting each to upper triangular form: 


Ja 38 1 1-204 
oo b f 1213 
a)l 3 7-1 4 b) 2405 
6 2 2 -8 5 321 


5. Verify your answers to problem 4 by computing the determinant 
via the method of expanding along a row or column. 


238 


6. Let A and B be two square matrices of any two sizes. 
(a) Show that the square diagonal block matrix 


` A 0 
aE d] 
for appropriate size 0 matrices has det(C) = det(4) det(B). 
(b) Explain why 


AL Ò 
y— f = 


As a consequence, what is the inverse of a diagonal matrix? 

7. Let A be a square n x n matrix. If A can be lower/ 
upper-triangularized using only type III elementary row operations, 
then how many of these operations do you expect to need? 

8. Give an example of each type of elementary matrix for size 3 x 3, 
and give their inverses. 

9. If A is a product of elementary matrices of size n x n, then is A 
invertible? Explain your answer. 

10. Let A be a square matrix with det(A) 4 0. Can you find a by 
successively applying elementary matrices to A in order to produce 
the n x n identity matrix In? If yes, explain how. If no, explain why 
not. 

11. For a square matrix A, explain why rref{A) will be the identity 
matrix unless rref{A) contains at least one row of all zeros. 


Mathematica Problems 


1. Use the DetByGaussianElimination function to convert the 
following matrices to upper triangular form: 


239 


i-i € # i12-3 8 
a Gg 29 1 =) 4 
@)) 3 9 1 7 l3 6 1 -1 
Si & & ÜA <b 4 
L-i —§ A Aa E E 
<a o 123-6 7 
mM) se | al 2 9 7 38 =-1 
9 
ak E == § a Eg 
6 7 -5 2 8 
3 3 125 % 222 44 
> & £6 8 a ¥ 6 &@ p 
f)| E 6-732] ®] EE 1 0 « 
—2 -2 48 3 $3 3 0 58 -8 
8 6 48 3 5 -1 0 3 6 


2. Use the PivotDown function to convert the matrices from 
problem 1 to upper triangular form. 


3. Compare your answers from problems | and 2. 
4. Compute the determinants of the matrices from problem 1. 
5. Let 


wale col sl- seo 


no ol Nie Cle 


ja 
wits loo aa poe 


Ie ape OS Ble 


Notice that this matrix has all rational entries instead of just integers 
that we have been used to using. 
(a) Find the smallest positive scalar c so that A = cB where B is 
4 x 4 also, but B has all integer entries. 
(b) Now triangularize both A and B in order to compute their 
determinants. Is det(A) = c‘det(B)? 
(c) Using the two 4 x 4 matrices in part (b), triangularize their 
product AB and see if det(4B) = det(A) det(B). 
(d) Compute A? and triangularize it. Is det(A”) = det(4)° ? 


240 


5.4 LU Factorization 


We now take a slight detour from our study of determinants and focus on 
the process of triangularization that we introduced in the previous section. 
If we turn our attention to how we obtained the upper triangular matrix U 
from A, we will notice that we “threw away” some information. More 
specifically, if we look at the 7 x 7 matrix A from Section 5.3, the 
information we did not keep was the multiplicative factors for the pivot 
row to cancel out all entries below the diagonal. 


We can construct a second matrix L that contains these multiplicative 
factors, with a sign change, on the lower diagonal region located in the 
same position as the entry that was zeroed out. This will result in a lower 
triangular matrix that will be the Z in the LU factorization of A, where U is 
our upper triangular matrix conversion of A through pivoting. The two 
triangular matrices L and U will now factor A as A = LU. 


Example 5.4.1. We start with a smaller (than 7 x 7) example, which will be more 
tractable to help illustrate this process, with the use of Mathematica. 


2-7 -8 9 
se] f 4 2 ob 
-2 3 4 -1 


A = {{3, 4, —2, 5}, {2, —7, —8, 9}, {1, 4, 2, —5}, {—2, 3, 4, —1}}; 
L = IdentityMatrix[4]; 
(A1 = {A[[1]], A[[2]] — (2/3) A[[1]], A[[3]], A[[4]]}) // MatrixForm 


29 2 7 
o- -8 ¥ 
1 4 2 -5 
-2 3 4 1 


241 


L[[2, 1]] = 2/3; 
e = {A1[[1)], A1[[2)]], A1[[3]] — (1/3) A1[[1]], A1[[4]]}) // Matrix- 


3 4 =2 5 


29 20 17 
T oP =e ve 
8 8 20 
t H p “3 
a 8 £ a 


L{[3, 1)] = 1/3; 
(A3 = {a2[[1]], A2[[2]], As}, A2([A]] — (—2/8) Alfa]}) // Matrix- 


3 4 —2 5 


29 20 17 
vag => F 
8 8 _20 
0 3 3 3 
17 8 7 
OF 3 3 


L{{4, 1)] = —2/3; 
(A4 = {A3[[1]], A3[[2]], A3[[3]] — (—8/29) A3[[2]], A3[[4]]}) // Ma- 
trixForm 


3 4 —2 5 


_29 _2 17 
0 3 3 3 
24 148 
0 0 % -2 
17 & 7 
Oo 3 3 3 


L[[3, 2]] = —8/29; 
(A5 = {A4{[1]], A4[(2]], A4[[3]], A4[[4]] — (—17/29) A4[[2]]}) // Ma- 
trixForm 


5 4 Æ 5 
29 20 17 
t = sF A 
2 8 
00 % -a 
36 164 


L{[4, 2]] = —17/29; 
eg {A5[[1]], A5{[2]], A5[[3]], A5[[4]] — (—36/24) A5[[3]]}) // Ma- 


3 4 —2 5 


9 -# 2 g 
3 3 3 

24 148 

0 0 3 `-o 
0 0 0o -2 


242 


L{[4, 3]] = —36/24; 
L // MatrixForm 


1 0 0 0 
2 
5 0 0 
1 8 
a -5 1 9 
[a oe y 
3 29 2 


L.A6 // MatrixForm 
3 4 -2 5 
2 -T7 -8 9 
1 4 

-2 3 4 -l 


We now have factored the matrix A into the product of lower and upper triangular 
matrices: 


3 4 -2 5 1 0 0 0 3 —2 5 
wd 2 29 20 17 

2 af -8 9 = 3 1 0 0 0 3 -y T 

d e > = 1 8 24 148 

1 4 2 -5 3 3 1 0 0 39 29 
-2 3 4 -l -4 -E -3 1 0 0 0 -2 


Now, we must ask the following question: Why is this factorization of any use? As 
always, our focus is on solving systems of equations of the form 4X = B. What we now 
have is a system of the form LUX = B. There must be an advantage to this form even 
though it appears to be potentially more complicated, due to extra matrix multiplications. 
To solve for X in LUX = B, we perform a two-part process: (1) solve the system LY = B 
for the unknown column matrix Y, and (2) solve the system UX = Y. The second step 
arises from the fact that we simply performed the substitution Y = UX in the original 
equation to end up with LY = B. 


We apply this technique to our example. First, we will construct a linear system of 
equations involving our matrix A by picking a particular value of B: 


3 4 -2 #5 fi =I 


i -F 8 9 t| | 2 
1 4 2 =5 ts || — 


=E oe | oe) =l T4 =] 


243 


As instructed by the first step in this process, we start by solving the system: 


1 0 0 0 Yi -1 
2 i 
5 1 0 0 yo 2 
1 8 a nae 
3 25 1 0 Y3 
tf W 2 j wea 
$ =a "3 Y4 
The method of solving this system is known as forward substitution. The first row of the 
system of equations above is simply y1 =— 1. Now that we know the value of y1, notice 
2 
that the second row is 4 y1 + y2 = 2. We forward the value of yı into the second row to 
8 


obtain y2, yielding y2 = 3. na similar fashion, now that yı and y2 are known, y3 can be 
solved for in row 3. The last variable y4 can be solved for from row 4 to give a final 
solution set of 


8 56 y 
n = —1, p= z7 n= -g 1=—3 


Onto the second step in the process, we have the system 


3 4 —2 5 Tı —1 


29 20 17 8 

o =e =e 3 zx 3 
24 _148 wo VS |e 

0 0 39 29 T3 29 


0 0 0 —2 T4 —3 
which can be solved in a similar fashion, except by backward substitution. We start with 


the last equation, which is —2x4 = —3, solve for x4, back up one row, determine x3, and so 
on. Our final solution is given by 


E pE PE 
3° v Fo 6” ee 99" oe 


We will perform all of these steps in Mathematica next: 


244 


X = {{xi}, {x2}, {xs}, {xa}}; 
Y = {{y1} {y2} {ys}, {yah} 
B = {{—1}, {2}, {—3}, {-1}}: 
L.Y // MatrixForm 


J1 
2n +y 
-F +r 
P-t 


Yi 
3 
2y _ 
R 


Flatten[Ysoln = Y /. Solve[L.Y == B, {y1, Y2» Ya, y4 }H[[1]]] 


8 56 
E 
A6.X // MatrixForm 


3xı + 4x2 — 2x3 + 5x4 
29x2 _ 20x3 J 17 x4 
3 3 3 


24x3 _ 148x; 
29 29 


~2x4 


Flatten[{Xsoln = X /. Solve[A6.X == Ysoln, {x1, x2, x3, x4}][[1]]] 


22 25 83 3 
(a eal 


245 


RowReduce[Join[A, B, 2]] // MatrixForm 


1000 # 
0100 -Ë 

83 
0010 8 
0001 3 


LU factorization is commonly used in numerical applications because of 
its efficiency and ease of implementation. For instance, forward or 
backward substitution can be performed in a very systematic and 
programmatically simple way. Furthermore, all information regarding the 
L and U matrices can be stored in one matrix, which can be a very 
important factor when dealing with large matrices and when memory 
becomes an issue. Consider our previous example, under the assumption 
that we remember that the diagonal of L consists of all ones, both matrices 
Land U can be stored in the single matrix: 


3 4 =3 5 
2 _29 _ 20 17 
3 3 3 3 
1 _8 24 _ 148 
3 29 29 29 
w= 2 = 
3 29 2 2 


The one problem that we have not dealt with is when pivoting is required 
to transform our matrix to a triangular one. This would require a type I 
elementary matrix. There are methods of dealing with this problem, which 
usually involve a pivoting matrix P, and the resulting equation is given by 
PA = LU. This idea is to multiply A on the left by an elementary matrix of 
type I, or a combination of elementary matrices of type I, so that the 
resulting matrix can be decomposed into LU form. 


It should come as no surprise that Mathematica has a built-in LU 
factorization command, LUDecomposition, which will return both the 
upper and lower triangular matrices, and a pivot list. Note in the 
commands below that Mathematica comes up with an LU factorization 


246 


different from the one we found by hand, and as discussed previously, the 
upper and lower triangular matrices are both stored in the same matrix: 


({lu, p, c} = LUDecomposition{A]){[[1]] // MatrixForm 


1 4 2 -5 
3 -8 -8 20 
2 3 -7 


5 
8 
-2 -H -1 -2 


p 
{3, 1,2, 4} 


(1 = lu SparseArray[{i_, j-} /; j < i +1, {4, 4}] + IdentityMatrix(4]) 
// MatrixForm 


0 


15 


1 0 0 
3 1 0 0 
2 % 1 0 
11 sf 4 


-2 -u 


(u = lu SparseArray[{i_, j-} /; j > i —>1, {4, 4}]) // MatrixForm 


1 4 2 =5 
3 4 -2 5 
2-7 -8 9 


-2 3 4 -l 


247 


A[[p]] // MatrixForm 


1 4 2 -5 
3 4 -2 5 
2 -7 -8 9 


Homework Problems 


1. Solve the following systems by forward or backward substitution: 


49 = © zı 5g 
0 2 7 -9 T? = 3 
©] 9 o -2 10 el | °F 
00 0 5 4 ay 
1 000 yı 1 
<5 A @ 2 y |_| oO 
4) 3 210 "i | = 
A <3 5 I Y4 3 


2. Compute the LU factorization of the following matrices: 


248 


13 doy § 4 <@ og 
4 = 4 1 2 8 
(@)) os 1 -1 Om)  ® 4 Hi 
A he Ü el 0 -2 >ý 
a wa $ 1 000 
2 F 7 D =- 16 6 
OI 3 -§ ~2 10 d] 3 210 
6 81 $ “4 3 5 i 
1-39 5 8 i ml a 
am i mee ay t f oh = 
(} 3 9 1-7 OJ 3 9 = 
0-3 5 1 a ee 1 


3. Compute the determinant of each of the matrices from problem 2. 


4. Solve the following systems of equations by LU factorization, 
performing forward and backward substitution. 


L3 
-2 1 
()| 9 8 
-2 1 


(b) 


8 

4 

—1 

0 

-1 
o|- 
6 


l 
-2 
1 
0 


-7 

6 
| 
-1 


-9 

8 
=ł1 
-9 


5 
a9 
10 
5 


249 


Ta 


| | 
nuwe 
oOrnrWN KK OC Se | 


Mathematica Problems 


1. Compare your answers to homework problem 2 with the 
LUDecomposition command. 


2. Given L,U e R”” and X,Y,B e R”*!, construct a Mathematica 
procedure to perform forward substitution for the system LY = B 
and backward substitution for the system UX = Y. 


3. Using your algorithm from Mathematica problem 2, solve each 
of the systems from homework problem 1. 


4. Construct an algorithm that will perform LU decomposition on a 
square matrix A, with the results of both L and U stored in a single 
matrix. 


5. Using your algorithms from Mathematica problems 2 and 4, 
solve each of the systems from homework problem 4. 


5.5 Inverses from rref 


The purpose of this very short section is to find a very fast way of 
computing inverses to square matrices. The adjoint formula is very useful 
theoretically, but too cumbersome to be practical as a general 
computational tool. If we start with a square matrix A of dimension n X n, 
then, by the adjoint formula, assuming det(A) + 0, A has an inverse Ar 
with 447! = In. Defining Aj? as the jth column of A |, we find that the 
result of the matrix multiplication AAj is simply the jth column of Jn: 


Ay; 0 
Aj Aj 2 stats Ain è 
Ag; A22 © Aan a 
: . ; ta |=] 1 
An. Ano ae Ann : 1 . 
A- 0 
(5.16) Anj 


As a result of this matrix equation, we end up with the following n 
e Ay} 
equations in the n unknowns, +J for 1 <i <n: 


250 


Ay, Az} i+ Ay 2Az} t-e Arn Az, = 0 
Ag, 1 Az} aT Az, 2Az | be ahi s AanA; 5 =0 


Aja Ajj + Aj2Ag5 + + Ajn Any =1 


Ana Ajj + An 243) NENNT Annai =0 


We can now use rref to solve this system of equations for the unknown 
entries in the column matrix Aj l if we apply rref to the augmented matrix 
(Ajjth column of Zn). This process of finding each of the columns of A} 
using rref can be combined into doing rref once on the single augmented 
matrix (|In). We obtain 


san T (Alin) = (Inl) 


We can start with the simple case of A € R?*?. Given 


a b 
a= [ea 
we know that 
1 d —b 
-l _ 
eis -ax| 2 d 


So we will attempt to compute the inverse of A by starting with the 
augumented matrix 


a 6 1 0 
ec dO 1 


and performing elementary row operations on (A|/2) to arrive at the RHS 
of equation (5.17). 


251 


First, under the assumption that a # 0, we can divide row | by a to get 


1220 
ec d@0i1 


Next, we take —c times row 1 and add it to row 2 giving: 
b 1 
i Fs Eo | 
ad—be c 
Q edt’ _¢ |] 


We now need to make the entry in row 2, column 2 a value of 1. To do 


a 
this, we multiply row 2 by ad—be. 


i= 3 0 
a a 
0 1 £ 2 


~ad—be ad—be 


b 
Next, we take —a times row 2 and add it to row 1: 


1 0 1 + Pen.: ae EAER, 
a aļlad-— bc) ad — be 
c a 


3 ~ ad — bc ad — be 


1 bo _ dd 
with a little algebraic manipulation, we see that @ t aļad-bc)  ad—bc' 
therefore, we have 
d b 
ad — be ad — be 
Cc a 
0 


(5.19) = ad — be ad — be 


1 0 


Notice that after removing the identity from this matrix (i.e., the first two 
columns), what remains is A, giving us the identity in equation (5.17). 


252 


Once we know how to perform this process by hand, we revert to 
Mathematica for more complicated examples. We will be making use of 
the Drop command, which will allow us to remove the identity from 
(In |) to obtain the inverse matrix 4!. 


Example 5.5.1. We will use the 7 x 7 matrix A from Example 5.3.4 to illustrate the 
method of inverses from rref on a matrix with square dimension larger than 2 x 2. The 
Identity Matrix command gives us an easy way to create the 7 x 7 identity matrix /7: 


(A = {{-9, 3, 0, 2, —4, 5, 1}, {6, 8, —7, —1, 2, 10, 3}, {2, 5, —1, 
—15, 6, 2, —4}, {11, 7, 5, 4, —8, —1, 6}, {—7, 1, 1, 2, 5, —3, 8}, {1, 
—12, 9, 0, 4, —2, 13}, {0, —9, 7, 2, 6, 3, —4}}) // MatrixForm 


-9 3 0 2 —4 5 1 
6 8 -7 -1 2 10 


2 5 -l1 -15 6 2 

11 7 5 4 -8 -1 6 
-7 1 1 2 5 -3 8 
1 =-12 9 0 4 -2 13 
0 -9 7 2 6 3 -4 


100 0 0 0 0 
01000 0 0 
00 10 0 0 0 
00 0 10 0 0 
000 0 1 0 
00 0 0 0 1 0 
0000 0 0 1 


-9 3 0 2 —-4 5 1 10000 0 0 
6 8 -7 -1 2 10 3 010000 0 
2 5 =i -15 6 2 -€4€00100 0 0 
11 7 5 4 -8 -1 6 0001000 
-7 1 1 2 5 -3 8 00001 0 0 
1 -12 9 0 4 -2 13000001 0 
Oo -9 7 2 6 3 -40 000001 


253 


RRAId7 = RowReduce[AId7]}; 
Drop[RRAId7, {}, {1, 7}] // MatrixForm 


_ 46491 19218 — -5100 16811 — 20348 3743 4576 
854627 854627 854627 854627 854627 854627 854627 
167953 — _ 3487 139939 263815 719173 __ -1063685 474377 
20511048 5127762 5127762 5127762 10255524 20511048 20511048 
287649 — 62953 67669 114941 102269 — 67277 492449 
6837016 1709254 1709254 1709254 3418508 6837016 6837016 
— _78861 42124 — -143255 11334 220933 — 132403 145687 
3418508 2563881 2563881 854627 5127762 3418508 3418508 
277067 67468 23186 9733 215842 160049 299573 
5127762 2563881 2563881 2563881 2563881 5127762 5127762 
315005 43409 8399 _ _ 9755 — _96527 101957 173281 
5127762 854627 2563881 2563881 2563881 5127762 5127762 
59431 117211 20699 9 145177 329485 285137 
6837016 5127762 5127762 1709254 10255524 6837016 6837016 


Inverse[A] // MatrixForm 


_ 46491 19218 _ 5100 16811 _ 20348 3743 4576 
854627 854627 854627 854627 854627 854627 854627 
167953. _ _3487 139939 263815 719173 1063685. _474377_ 

20511048 5127762 5127762 5127762 10255524 20511048 20511048 
287649  _ _62953 67669 114941 102269 _ _67277 492449 

6837016 1709254 1709254 1709254 3418508 6837016 6837016 

_ 78861 42124. _ 143255 11334 220933 _ 132403 145687 
3418508 2563881 2563881 854627 5127762 3418508 3418508 

_ 277067 67468 23186 _ _9733 215842 _ 160049 299573 
5127762 2563881 2563881 2563851 25638851 5127762 5127762 
315005 43409 8399 9755 96527 101957 173281 
5127762 854627 3563881 2563881 ~ 2563881 5127762 5127762 
59431 117211 ___ 20699 9 145177 329485 285137 

6837016 5127762 5127762 1709254 10255524 6837016 6837016 


The inverse matrix to A is the same as the matrix we get from rref (A|In) above after 
deleting its first seven columns. 


The following theorem is a continuation of Theorem 5.2.1 on solutions to 
square linear systems, and it follows nicely from the material that we have 
developed in this section. 


Theorem 5.5.1. Given the system AX = B, where A e R”, det(A) +0 if 
and only if there exists a unique solution to the system for all B. In this 
case the unique solution to the system is X = A'B. Asa consequence, the 
system has either no solution, or an infinite number of solutions, exactly 
when det(A) = 0. 


Proof. If you look at the statement of Theorem 5.2.1, it tells us half of 
what we want to prove, namely, if det(4) # 0, then the system has a unique 
solution X= AİB. This means that half of our work is already done, since 
we now only need show that if the system has a unique solution for all B, 
then det(A) # 0. Now, let B be any column out of the identity matrix Jn. 
Then we can solve all of the equations AX = B for their unique solutions 
using rref (A\In). The only possible way we can have unique solutions in 


254 


this situation is when rref (A|Jn) = Un|C), for some n x n matrix C. This 
last rref computation also tells us that rref (A) = In. Then det(rref{A)) = 1, 
and so det(A) + 0, since det(rref{A)) is a non-zero multiple of det(A) from 
the proof of Theorem 5.3.2. That is, det(A) + 0 exactly when det(rref{A)) + 
0. 


Homework Problems 


1. Use the method of row reducing (A|J;). to compute the inverse to 
each of the following matrices: 


1 -i 2 
-1 3 -1 
1 2 -2 


-1 2 1 
(d) 1 —2 1 
0 10 


(a) 


-1 210 13 -5 2 120 1 
ae ata 14 £0] wl 203 3 
@) 2 1| S gs 21] Vlaewi 
2-561 -3 6 -4 3 020 4 


2. In this section, it was shown that row reducing the matrix (A|J2). 
resulted in the correct value of A~!. The first step in this process 
required that a# 0. Repeat this procedure, but this time assume that 
a = 0. You may assume that b + 0 and c # 0. Remember that you 
cannot swap rows. 

3. If A is square but has no inverse, then what does rref (A|In) 
produce, and how does it tell us that 4 has no inverse? Give some 
examples. 

4. Explain why if E is an elementary matrix, E l is also. 

5. Explain why for any square matrix A that is invertible, Alisa 
product of elementary matrices. 

6. Explain why for any square matrix A that is invertible, A is a 
product of elementary matrices. 

7. Express the matrix in problem 1 part (c), and its inverse, as a 
product of elementary matrices. 


255 


8. Is Theorem 5.5.1 still valid if we replace “for all B” with “for at 
least one B”? 


Mathematica Problems 


1. Verify your answers to homework problem 1 by using the 
RowReduce command on the matrices (A|Jn). 

2. Compute AA! for each of the inverses found from Mathematica 
problem 1, verifying that row reducing (A|Jn) yields (nA). 

3. Let A be a 3 x 3 matrix and B be a 4 x 4 matrix, both from 
homework problem 1. Determine if the following identity holds: 


wo([$ sie lle bo) 


Explain why this should or should not work. 


5.6 Cramer’s Rule 


We now discuss a final method of solving linear systems of equations. 
Cramer’s rule will allow us to solve a square system of linear equations 
AX = B when the matrix of coefficients Æ is invertible. Remember that to 
be invertible, we require that det(A) #0. Cramer’s rule will allow us to 
compute the solutions to the system using only determinants. We will 
explore the 2 x 2 case to see how this is possible. Let 


5. | b i 
a=|¢ JE 


then the solution is given by X = AB. Let us look carefully at this 
solution X using Mathematica, and see if it is possible to write this 
solution in a form that only involves determinants: 


256 


(A = {{a, b}, {c, d}}) // MatrixForm 


(B = {{a}, {8}}) // MatrixForm 


(3) 


(X = Inverse[A].B) // MatrixForm 
da b 
( —betad — berg 
= see d T -bca d 


Simplify[Det[A] X] // MatrixForm 


da— b8 
-ca +a 


Notice that if we multiply the solution X by det (A), then the two entries of 
this new column each look like determinants of 2 x 2 matrices. In fact, 
notice that the first entry in our solution column matrix is 


E a b 
da -v8 = det (| B ay 


while the second is 


-ca +a8 = aet (| 2 pi |) 
oe 2 


Thus,our solution X can be written using determinants as X = [x y]! where, 


257 


“(iD ela 


s= V det(A) 


(5.20) 


This method of using determinants to solve square linear systems is called 
Cramer ’s rule and it works on any size square system. Let us now check 
this with an example. 


Example 5.6.1. We wish to solve the linear system 


5r + 13y = -8 
dr — Ty =2 


whose matrix of coefficients A and column of RHS values B are given by 


5 13 -8 
a | 4 -7 |. is | - 
A = {{5, 13}, {4, —7}}; 


B = {{—8}, {2}}; 
(Xnum = Join[B, Drop[A, {}, {1}, 2]) // MatrixForm 


-8 13 
2 -7 
(Ynum = Join[Drop[A, {}, {2}], B, 2]) // MatrixForm 
5 -8 
4 2 
(X = {{Det[Xnum]/Det/[A]}, {Det[Ynum]/Det[A]}}) // MatrixForm 


($) 


258 


Inverse{A].B // MatrixForm 
29 
_i4 
29 


This example verifies that Cramer’s rule does indeed work in the 2 x 2 
case. Let us now see if it works in the 3 x 3 case with a general 
computation, and then a numerical example: 


(A = {{a, b, c}, {d, e, f}, {g, h, i}}) // MatrixForm 


abe 
d ce jf 
cn 4 


(B = {{a}, {3}, {6}}) // MatrixForm 


i 


X = Inverse[A].B; Simplify[Det[A] X] // MatrixForm 
-fha+eia+ch8—biS8—ced+bfé 


fga-—dia—cgP+aiS+cdd—afé 
-egat+dha+bgS8—ahS8—bdéd+aed 


Det[{{a, b, c}, {8, e, £}, {6, h, i}}] 
-f ha+e ia+c h 8-b i ĝ—c ed+b fô 


Det[{{a, a, c}, {d, 2, f}, {g, 6, i}} 
f ga-d ia—c g 8+a i ĝ+c dé—a fd 


259 


Det[{{a, b, a}, {d, e, A}, {g, h, 5}}] 
—e gat+d ha+b g8—a h 8-b dé+aed 


Note that if we multiply the solution X by det (A) = aei — afh — dbi + dch + 
gbf — gce, then the three entries of this new column matrix each look like 
determinants of 3 x 3 matrices: 


aei —afh — Bbi + Bch + dbf — dce = det 


—adi + afg + Bai — Pcg — daf + ded = det 


adh — aeg — Bah + Bbg + dae — dbd = det 


G AA GAA BWA 
TITS HDR FTO & 
=% a 


Thus,our solution X = [x y zy’ can be written using determinants as 


(5.21) 
a be aoe c 
det Dp e f det d Bp f 
S ô h i E g 6 i 
ni det(A) ry det(A) 
abe 
det des 
B g h ô 
a det(A) 


This is again Cramer’s rule at work on a 3 x 3 square linear system. Now, 
let us check this 3 x 3 case with an example. 


Example 5.6.2. We want to solve the linear system 


260 


5r + 13y — 9z = -8 
åt — Ty + 6z =2 
-T + lly- 4z = 0 
Its matrix of coefficients A, and column of RHS values B are given by 
5 13 -9 —8 
A= 4 -7 6], B= 2 
-l1 ll —4 0 
A = {{5, 13, —9}, {4, —7, 6}, {-1, 11, —4}}; 


B = {{—8}, {2}, {0}}; 
(XVar = ReplacePart[A, {i-, 1}—B[[i, 1]]]) // MatrixForm 


(YVar = ReplacePart[A, {i., 2}—>B|[[i, 1]]]) // MatrixForm 


5 -8 -9 
4 2 6 
-1 0 -4 


(ZVar = ReplacePart[A, {i_, 3}-+B[[i, 1]]]) // MatrixForm 


5 13 -8 
4 -7 2 
-1 ll 0 


{Det[X Var] /Det[A], Det[YVar]/Det[A], Det{ZVar]/Det[A]} 


(ar iar it} 


Flatten[Inverse[A].B] 


(ar iar iat} 


261 


In both the 2 x 2 and 3 x 3 cases, it was shown that the kth variable’s 
solution could be expressed as a ratio of determinants. The numerator is 
the determinant of the matrix obtained by replacing column k of A with B. 
The denominator is simply the determinant of A. 


To generalize this process, given the system AX = B with A €e R”" B e 
= and X = [x1 x2, ..., xnl”, the solution is given by 


(5.22) 
det (Aj, 3) det (Azp) _ _ det (App) 


~ “det(A) * 7? “det(A) °°°°°"" AA) 


Tı 
where ÆK,B is the matrix obtained by replacing column k of A with B. To 
see exactly how this works, we reexamine the adjoint formula. Under the 
assumption that Æ is invertible, then the solution to 4X = B, with the help 
of the adjoint formula (5.7), is given by 


1 z 
X= CTB 
det(A) 


If we explicitly write out the formula for the ith entry X;, in the column 
matrix X, we have that 


X; = (Cy, Bi + C7 Bz +--+ + C7, Bn) 


l 
det(A) t(A) 
1 


(Ci, iBı + Co ¿B2 +-+ Ce: iB,) 
(5.23) = det(A) 


Notice that the second line in this expression is also equivalent to 
computing the determinant of the matrix A with column i replaced with 
the column matrix B. This is how we end up with equation (5.22), 
Cramer’s rule. 


Example 5.6.3. We do one last example with a 4 x 4 system: 


262 


-7 
—2 
—2 

0 


-1 2 
0 —5 
3 0 
4 -3 


3 
4 
1 
1 


Using the formula given in (5.22), we have that 


T} 
T2 
T3 
T4 


_ & 
~ det(A) 


det (A; 5) 
det (A2 B) 
det (A3,B) 
det (Aq p) 


where det(A) = 61 and 


—2 -1 

3 0 

det (Aı B) = det i 3 
-2 4 

-7 -2 

-2 3 

det (A2, g) = det sg i 
0 -2 

-7 -l1 
—2 0 
det (A3,B) = det —?2 3 
0 4 

-7 -l1 

—2 0 

det (A4,B) = det —92 3 
0 4 


263 


2 
—5 
0 
—3 


3 
1 
3 
4 
l = 399 
1 
3 
4 
1 = 60 
1 
3 
4 
1 = 347 
1 
-2 
3 
1 = 679 
—2 


After computing the determinants of the four 4,8 matrices, we arrive at our solution: 


399 
1 | 60 
X= Gi 347 
679 


Once again, we will have Mathematica compute the Aj,B values: 


A = {{—7, —1, 2, 3}, {—2, 0, —5, 4}, {—2, 3, 0, 1}, {0, 4, —3, 1}}; 
B = {{—2}, {3}, {1}, {-2}}; 
(A1B = ReplacePart/A, {i_, 1}-+B[[i, 1]]]) // MatrixForm 


-2 -l 3 
3 0 -5 4 
1 3 D 1 
—2 4 -3 1 


-7 -2 2 3 
—2 38 —5 4 
-2 1 oO 1 
0 -2 -3 1 


-7 -1 -2 3 
-2 0 3 4 
=2 3 L l 
0 4 -2 1 


-7 -1 2 -2 


264 


{Det[A1B]/Det[A], Det{A2B]/Det{A], Det{A3B]/Det[A], Det[A4B]/ 
Det[A]} 


{= 60 347 =} 


61’ 61' 61° 6l 
Flatten({Inverse[A].B] 


(me o 247, em) 
61 ' 61' 61° 61 


Homework Problems 


1. Solve the following systems of equations by using Cramer’s rule: 


t J a 
ft & F Tı 2 

3 2 4 T zz | 3 

()| i 23 -2 a a 
2 06 4 T4 -5 

t 2 g4 zı 0 

i A @ aq |_| 2 

4} ai 3 7 8 a i |2 
1 —2 -2 5 T4 1 


2. Given the following matrices 


265 


jaj l =j 
Esla 1 -1],B=]| 3 
i oğ 7 


define Ai,B to be as specified in this section, and (Aj|B) to be the 
matrix found after removing column i from A and then augmenting 
the resulting matrx with B. Compute the determinants of the 
following matrices: 


(a) A (b) Al.B (c) Ao.B (d) Ag.pB (e) (A|B) 


(£) (A2|B) (g) (As|B) 


3. Determine the relationship between det(A;, g) and det ((A;|B)), and 
use it to verify your results from problem 2. 


4. Let a be a scalar in the matrix equation 


a l gr} _|4 

2 3 yi 616 
For what values of a does this system have one solution? For what 
values of a does this system have no solution? For what values of a 


does this system have infinitely many solutions? 
5. Let a be a scalar, in the matrix equation 


4 1 T 4 
a 3 a Y =| D 
0 -1 5 z 6 


For what values of a does this system have one solution? For what 
values of a does this system have no solution? For what values of a 
does this system have infinitely many solutions? 


6. For the matrices 


_[3-é 142% _ [ 345i 
ae [Bf 8]. 2 [38 


266 


use Cramer’s Rule to solve the system AX = B. 


Mathematica Problems 


1. Solve the following systems of equations by using Cramer’s rule: 


3 1 -74 3 Tı 3 
1-2 62 -2 To 2 
(a) 8 1 -1 1 0 T3 = 9 
D £ =i Se -1 
4% £2 4 T5 2 
S$ -6 -0 3 1 ry 9 
4 2 $ 6 2 T2 1 
W=} 1 =1 =1 7 zz |=] -1 
0 1 4 7 7 T4 0 
as -k -b B T = | 
l1 -1 5 T -3 2 Tı 3 
3 24 7 3 —4 T2 —3 
(c) =d 2 3 -2 1 3 z3 | _ 4 
—2 06 4 9 9 T4 0 
6 -2 9 2 0 3 T5 4 
=% 0 3 -5 8 -8 Te 1 
1 2 3 -4 -3 0 ml -2 
S = U £ A XQ 3 
12-3 4 1 2 T3 5 
(d) —l 3 7 8 4 -1 a 1 
i-2 2 5§ = 5 T5 2y 
0 -2 => 9-5 7 Te 1 


2. Using your answer to homework problem 3, solve the systems of 
problem without swapping rows 


3. Use Cramer’s rule to solve the system AX = B, for A and B 
defined as follows: 


267 


3-i 
5+i 
3+i 
6-1 


1 + 2% 
1—4 
7+ 2 
2-1 


i Tri 
4 "fi 
i10 4+% 
8—t 7+i 


268 


3+5i 
2 — bi 
—2i 
-4 +i 


Chapter 6 
Basic Vector Algebra Topics 


6.1 Vectors 


The mathematical idea of a vector comes from physics and the ideas of 
force, velocity, and acceleration. A force (or velocity and acceleration) in 
physics has two characteristics, the first is strength (or magnitude), which 
must be nonnegative, and the second is direction. If the acceleration on an 
object is due only to the force of gravity, then the direction of gravity’s 
acceleration is straight down toward the center of Earth and its magnitude 
is almost constant near Earth’s surface, at 32.2 ft/s?. A vector is any 
quantity for which the two defining traits of magnitude (norm) and 
direction can be realized mathematically. 


We will denote a vector as a variable with an arrow over it (e.g., 7). 
When specifically defining a vector, we will use angled brackets. As an 
example, 37 = (1, —1, 2, 1) is a vector in Ri. You should think of rr as 
the arrow starting at the origin (0, 0, 0, 0) as its base point in Rí and 
stopping at the point (1, —1, 2, 1) in Rí The magnitude of this vector 7, 
denoted |7} |, is the length of the arrow, which is 


[P| = VP + (-1)? +27 4+ 1? = v7 


The direction of this vector 7 is given by the arrow that represents it. We 


have assumed here that the distance d(P, Q) between two points P, Q € R 
"is given by 


269 


\2 \2 2 
6.1) d( P,Q) = y (pi — a1)" + (p2 — 42) + +++ + (Pn — Gn) 
if P = (pl, P2,..., Pn) and O = (q1, q2,..., qn), Which generalizes the 
distance formula in R^. 


This example also shows that we can have vectors in dimensions greater 
than three. One of the first things to note is that there is only one vector of 
magnitude 0, no matter what dimension our vector may be in. This vector 
is called the zero vector and denoted T given as (0, 0) in R and (0, 0, 0, 


0) in Rí, for example. The zero vector (0, 0,..., 0) € R” is the only vector 
for which direction does not play a role. 


Now that we have a definition of vectors, we can think of real Euclidean 
n-space as a vector space. Instead of n-tuples of points, we will have 
vectors of dimension n. 


Definition 6.1.1. The vector space R” is the set of vectors G represented 
by arrows starting at the base point of the origin (0, 0,..., 0) in the point 
set 2” and stopping at the point P = (p1, p2, ... pn) in the point set R”, 
and 7? is written as (p1, p2, ... pn) with n components p1, p2, ... pn: 


R” — { (pi, Pos- -s Pn) (P1; P2,- --,Pn) € R"} 


The notation and context will tell us whether Ẹ” is the vector space or the 
point set. The vector space RẸ” mathematically turns out to be the same as 
the set of column (or row) matrices in R’””! (or RP, as we shall see 
shortly since in the vector space JR” we can add two vectors and also 
multiply them by real scalars to get new vectors in RẸ”. For a more precise 
description of properties of a vector space, we refer you to the table in 
Section 8.1. 


Definition 6.1.2. For 4 = (p1, P2,..., Pn), € R”, the magnitude, or length, 
of 7 is defined to be 


(o| = ph + ph +--+ 92 


270 


A vector 9 is called a unit vector if | Z| = 1. R” is called a vector space 
because we will be able to do arithmetic in it. The vector space }” is said 


to have dimension n since it takes n independent components to form each 
element of R”. 


We now return to our discussion of vectors. The easiest way to 
geometrically represent a vector rr in R? or R, with the two properties 


of magnitude and direction, is that of an arrow with the length of the 
arrow the magnitude of 7, and the direction of the arrow the direction of 


7. Two vectors are said to have the same direction if their arrows are 


parallel, and when their arrows are drawn from a common starting or base 
point, one of the arrows follows exactly over the other. Two vectors are 
considered equal if they have the same identical arrows representing them 
when started at the same base point. Two vectors are said to have opposite 
directions to each other if, when each is represented by arrows with the 
same base point, they point in opposite directions along the same line. If 
7 is a vector, then —7 is the vector of the same magnitude as 7, but 


pointing in the opposite direction. 


If c is a real number and 7 is a vector, then we define c7? to be the 
vector of magnitude lcv | = jel Ei where Wf and cq have the same 
direction if c > 0 and have opposite directions if c < 0. When c = 0, then c 
7 is the zero vector for any vector %7 . Thus, 5 G is a vector five times 
the magnitude of G7 and in the same direction as 7, while —7 is a 
vector five times the magnitude of 7 and in the opposite direction to 7]. 
Two vectors 9 and a are said to be parallel if they are multiples of each 
other, that is, if there is a real number c so that q} = co (or Y = cyt). 
This also implies that two vectors are parallel exactly when their 


respective arrows are parallel, or equivalently, they are parallel exactly 
when they determine a pair of parallel lines in R”. 


Now, let us illustrate these concepts. Remember that an arrow represents 
the same vector no matter its starting or base point, as long as it has the 
same magnitude (length) and direction. The Mathematica command 
Arrow, a graphics primitive, allows us to plot arrows to represent vectors, 
while we denote vectors simply as lists in Mathematica, treating them as 
row matrices. 


271 


Example 6.1.1. Let Gf = (13,0) e R? and Ff = (9, 15) e JR. Then 277 = (26, 0) 
and Tif = (27, -45). The graphs of the vectors am Ti and PETA with base point 
P(-7, -19), and Vv and -37 with base point Q(20, 14) are depicted in Figure 6.1. 


Figure 6.1: Graphical depiction of several vectors at three base 
points. 


P = {-—7, —19}; 
Q= {20, 14}; 
V = {13, 0}; 

W = {9, 15}; 
TwoV =2 V 


{26,0} 

NegThreeW = —3 W 

{-27, —45} 

ArrowPlots = Graphics[{Arrowheads[.10],Thickness[.010],Blue, Ar- 
row[{{0, 0}, W}], Blue, Arrow[{P, W + P}, Red, Arrow[{Q, V + 


Q}], Red, Arrow[{P, TwoV + P}], Blue, Arrow[{Q, NegThreeW + 
QHH; 

TxtPlot = Graphics[{Black, Text["W", {2, 7}], Text["V", {25, 12}], 
Text["2V",{6,—22}], Text["—3W",{14,—2}], Text["W",{—4,—10}]}]; 
Show|[ArrowPlots, TxtPlot] 


272 


Now that we have the geometric interpretation of vectors well in hand, let 
us move on to vector addition and subtraction. The key to these two 
operations is the parallelogram law, which states geometrically how to 
add vectors together, which is based on how, physically, two forces are 
combined to create a single force. In order to add two vectors, v and 7, 


we represent them as arrows starting at the same base point, and then the 
arrow representing their sum rr + 7f starts at this common base point and 


stops at the point that is at the opposite end of a diagonal for the 
parallelogram with 7 and 7 as two adjacent sides. The graph depicted in 


Figure 6.2 will clarify how this works. 


Example 6.1.2. Let 4f = (-10, 17) € R? and at = (35, 8) € JQ”. Then, 7 EET - 


(25, 25). We will plot these vectors with Mathematica: 


Figure 6.2: The parallelogram formed by the two vectors 9 and 
qt and the resulting sum of the two vectors. 


273 


V = {-10, 17}; 

W = {35, 8}; 

ArrowPlots = Graphics[{{Arrowheads[.10],Thickness[.010],Black, Ar- 
row[{{0, 0}, W}, Black, Arrow[{{0, 0}, V}, Red, Arrow[{{0, 0}, V 
+ WHH; 

TxtPlots = Graphics[{Black, Text["V", {—4, 11}], Text["W", {16, 
1}], Text["V+W", {16, 11}]}}; 


LinePlots = Graphics[{Thickness[.005], Blue, Line[{V, V + W}, 
Blue, Line[{W, V + WHH; 


Show[ArrowPlots, TxtPlots, LinePlots] 


Ww 


Note that for any vector GZ, we should define GF + C7) = rif since, 


physically, if you combine two opposite but equal forces, they cancel each 
other out to a force of T You may begin to notice that many of the rules 


of applying operations to real numbers have corresponding counterparts 
when dealing with vectors. Try to keep this in mind as you read through 
this chapter. 


Now in order to subtract vectors, we write Y — jt = Y + (-qf), and then 
perform addition that we already know how to do. 


Example 6.1.3. Let E7 = (-10, 17) e R’ and 7 = (35, 8) © JR” as in Example 6.1.2. 
Then Gf - at = 45. 5). We will plot both Uf and I. along with T - I and the 
diagonal of the parallelogram formed by rr and a: We will use the same first five 


commands as those of Example 6.1.2, which we will not reproduce here. 


Figure 6.3: The difference of two vectors q? and 7. 


274 


ArrowPlots2 = Graphics[{Arrowheads[.11], Thickness[.010], Black, 
Arrow[{{0, 0}, W}], Black, Arrow[{{0, 0}, V}], Black, Arrow[{{0, 
0}, —W}], Red, Arrow[{W, W + (V — W)}] , Red, Arrow[{{0, 0}, 
V — WHH; 


TxtPlots2 = Graphics[{Black, Text["V", {—0.5, 7}], Text["W", {18, 
2.5}], Text["-W", {—16, —6}], Text["V— W", {18, 14}], Text["V 
— W", {—20, 6}]}]; 

LinePlots2 = Graphics[{Thickness[.005], Blue, Line[{V, V — W}, 
Blue, Line[{V — W, —W}]}}]; 

Show[ArrowPlots2, LinePlots2, TxtPlots2, AspectRatio— 1] 


In Figure 6.3, notice that the vector rr = am completes the third side of the triangle that 
has the two vectors Y and a as adjacent sides. Depicted here are two vectors 


corresponding to 7 = a having the same magnitude and direction, and are therefore 


equivalent. 


We now move onto the discussion of linear combinations of vectors. 
Recall that we defined a linear equation in Definition 2.1.2. In a similar 
fashion we define a linear combination of vectors. 


Definition 6.1.3. A linear combination FW of vectors 


bf, T LAEN T e R” is another vector in R” of the form 


Y =at +a +- H a 


for real scalars a1, a2,..., ak. Linear combinations are similarly defined for 
vectors from Ç ”. 


275 


The simplest case to consider is a linear combination, ay F by. of two 
vectors Gf and a for real constants a and b. Note that when discussing 


vectors, it is traditional to call real constants scalars and to call 
multiplication of a vector by a real constant scalar multiplication. 


Any two nonzero vectors 3 and yf that are not parallel determine the 
unique plane P that passes through both Vv and 7 as arrows starting at 


the same base point of the origin, and this plane P consists of all the 
arrows starting at the origin formed by the vectors of their linear 
combinations aq + byt, that is, P = {av +bu | a,b € R}. This 
plane P as a point set is the unique plane through the three points {origin, 
endpoint of 3’s arrow starting at the origin, endpoint of a s arrow 
starting at the origin}. 


Therefore, any linear combination aq + byt of the two vectors G and 
q will also lie in this plane. Moreover, if wv is any vector in this plane, 
then there are unique scalars a and b so that 7? = ay + byt. This fact has 
different meanings depending on what dimension the vectors are elements 
of. For instance, if the two vectors are elements of RŽ then the plane 
formed by the two vectors contains all of R? However, if the vectors are 
elements of R’, then they form a plane in R, which is simply a subset of 
R. We will now have Mathematica illustrate this concept for us. 


Example 6.1.4. We will once again consider the vectors 7 =(-10, 17) € R and Tif 


= (35,8) € R- The linear combination we will consider here is 25W + 3.7 = 
(104.5, 72.1), which is depicted in Figure 6.4. 


Figure 6.4: The graphical depiction of a linear combination of 
vectors. 


2.5V+3.7W 


{104.5, 72.1} 


276 


ArrowPlots3 = Graphics[{Arrowheads[.07], Thickness[.010], Black, 
Arrow[{{0, 0}, V}], Black, Arrow[{{0, 0}, 2.5 V}], Black, Arrow[{{0, 
0}, W}, Black, Arrow[{{0, 0}, 3.7 W}], Red, Arrow[{{0, 0}, 2.5 V 
+ 3.7 WHH; 


TxtPlots3 = Graphics[{Black, Text["V", {—7.5, 2}], Text["W", {18, 
—1}] , Text["2.5V", {—4, 25}], Text["3.7W", {100, 15}], Text["2.5V 
+ 3.7W", {75, 37}]}]; 


LinePlots3 = Graphics[{Thickness[.005], Blue, Line[{2.5 V, 2.5 V + 
3.7 W}], Blue, Line[{3.7 W, 2.5 V + 3.7 WHH; 


Show[ArrowPlots3, LinePlots3, TxtPlots3, PlotRange—{{—27,133}, 
{—10, 75}}, Axes—True, AxesOrigin—{—27, —10}] 


a 


40 


20 
3.7W 


For the next set of operations, we will restrict ourselves to working in the 
xy-plane (i.e., R’): however, note that all the material presented here can 
be generalized to higher dimensions. First, we will let all of our vectors be 
represented by arrows starting at the origin. Next, we define 7 to be the 


unit vector from the origin to the point (1, 0). We can write > 


2 
endpoint, or 7 = (1, 0). Similarly, let 7 be the unit vector from the 


as its 


origin to the point (0, 1), and thus z = (1, 0). Then for any scalars a and 
b, a? is the vector from the origin to the point (a, 0), while bp is the 


vector from the origin to the point (0, b). Further, the vector a; + by is 


the vector from the origin to the point (a, b). It now makes sense to 
represent any vector, 9 = a? ag oF in the xy-plane, by its endpoint (a, 
b) when all vectors start at the origin; that is, Gf = (a, b) The first 
component of is a, and the second component of 7 is b. A vector 7? 


= (a, b}, with two real components, is said to be a two-dimensional vector 
and an element of R? 


2I, 


Definition 6.1.4. Given a scalar a, and vector v = (D1, P2,..., Pn) € R” 
the scalar multiplication a is defined to be 


av = (api, Qpo,..-,@Pn) 


Definition 6.1.5. Given two vectors W = (p1, p2,..., Pn) and R = (91, 
q2,.-- qn) of R”, we define the vector addition, denoted Z] + qg, to be the 
addition of components: 


V+ = (pi + 91,P2 + G2,+--sPn +n) 


Putting these two definitions together tells us that if 7 = a? F b? and 
F7, = c? + dF for any two vectors 9 and a in the xy-plane, then 
= Ue ters 
av + Bw =(aa+ Bc) i +(ab+ Bd) j 


or 


a(a,b) + B{e,d) = (aa + Be, ab + Bd) 


Example 6.1.5. We will now draw a picture to illustrate how the vector VY =5 TA +3 


1 
= 


j (5, 3) relates to the two unit vectors Fé and 7 Refer to Figure 6.5. 


Figure 6.5: The vector E7 = oF +3 7 = (5, 3). 


278 


v = {5, 3}; 

VI = {1, 0}; VJ = {0, 1}; 

V5I = {5, 0}; V3J = {0, 3}; 

ArrowPlots4 = Graphics[{{Arrowheads[.07], Thickness{.010], Black, 


Arrow[{{0, 0}, VI}], Black, Arrow[{{0, 0}, V5I}], Black, Arrow[{ {0, 
0}, VJ}], Black, Arrow[{{0, 0}, V3J}], Red, Arrow[{{0, 0}, V}]}); 
TxtPlots4 = Graphics[{Black, Text("J", {—0.25, 0.25}], Text["I", 
{0.5, —0.25}] , Text["3J", {—0.25, 2.5}], Text["5I", {4, —0.25}], 
Text("V = 51+3J", {3, 1.25})}]; 

Show[ArrowPlots4, TxtPlots4, PlotRange—-{{—0.5, 5}, {—0.5, 3}}, 
Axes—True, AxesOrigin—{—0.5, -0.5}] 


y 
3r 
3J 


V = 543J 


We next move on to three-dimensional vectors. In R with the 
xyz-coordinate system, we have three unit vectors rs = (1, 0, 0), 7 = (0, 
1, 0), and ry = (0, 0, 1) moving along the positive coordinate axes. Every 


three-dimensional vector G of R? can be written as a linear combination 


= D> ang > that ? ry 
f7, , and k- that is, 7 — qj +b7 pek = (a,b,c): 


Example 6.1.6. Let us plot the vector 

-b -b 
Y =4i — 77 -3k = (4, —7, —3), which is depicted in 
Figure 6.6. 


Figure 6.6: Graphical depiction of a three-dimensional vector 7 


expressed as a linear combination of ré 7 and mA 


279 


V = {4, —7, —3}; 

VI = {1, 0, 0}; VJ = {0, 1, 0}; VK = {0, 0, 1}; 

V4I = {4, 0, 0}; VMinus7J = {0, —7, 0}; VMinus3K = {0, 0, —3}; 
ArrowPlots5 = Graphics3D[{Arrowheads|[.05],Thickness[.010],Black, 
Arrow[{{0, 0, 0}, VI}, Black, Arrow[{{0, 0, 0}, VJ}, Black, Ar- 
row[{{0, 0, 0}, VK}], Blue, Arrow[{{{0, 0, 0}, V4I}], Blue, Ar- 
row[{{0, 0, 0}, VMinus7J}], Blue, Arrow[{{0, 0, 0}, VMinus3K}}, 
Red, Arrow[{{0, 0, 0}, VHH; 

TxtPlots5 = Graphics3D[{Black, Text["I", {1, 0, 0}], Text["J", {0, 
1, 0}] , Text{"K", {0, 0, 1}], Text("4I", {4, 0, O}], Text["—7J", {0, 
—7, O}], Text{"-3K", {0, 0, —3}], Text["V = 4I-7J-3K", {4, —7, 
—3}]})5 

Show[ArrowPlots5, TxtPlots5, Axes—True, PlotRange-+{{—1, 6}, 
{—8, 2}, {—4, 2}}, AspectRatio+1, ViewPoint {8, —2, 5}] 


V = 41-7J-3K 


Unfortunately, we cannot graphically depict vectors that are of dimension 
four or greater, but we can express them mathematically. R” is the set of 
n-dimensional vectors Vv that have n components (i.e. v = (a1, a2,..., 
an), for n scalar components a1, a2,..., an) Vectors work in exactly the 
same manner, regardless of their dimension; that is, they are treated like 
(row or column) matrices for the purposes of computations. 


We continue this section with some terminology. Given a point P in R” 
the vector UP, which is represented by an arrow starting at the origin and 
stopping at a point P, is called the position vector for the point P with UR 
= (P). If we have an arrow representing a vector 3 that starts at the point 


280 


P and stops at the point Q, then the vector 7] is written as 7 = PO: 
and it is computed by Z = si — jp. This is an example of a displacement 


vector. We can write any position vector as a displacement vector by 
letting P be the origin, that is, the position vector i = 0G- We now do an 
example of this in order to see how this works. 


Example 6.1.7. Let us compute and plot the displacement vector rf = P p which is 


represented by the arrow starting at the point P(—8, 11) and stopping at the point Q(7, 4). 
This vectors in this example are illustrated in Figure 6.7. 


Figure 6.7: Displacement vector pg) and position vector 7 in 


relation to B and G Notice that PÒ and Wf have the same 
direction and magnitude. 


P = {-—8, 11}; Q = {7, 4}; 
V=Q-P 


{15, —7} 


281 


ArrowPlots6 = Graphics[{{Arrowheads[.07], Thickness[.010], Black, 
Arrow[{{0, 0}, P}], Black, Arrow[{{0, 0}, Q}], Red, Arrow[{P, Q}, 
Blue, Arrow[{{0, 0}, VHH; 

TxtPlots6 = Graphics[{Black, Text["P", {—5, 5}], Text(["Q", {4.5, 
1}] , Text{("PQ", {0, 9}], Text("V", {10, —3}]}]; 

Show[ArrowPlots6, TxtPlots6, PlotRange-+{{—10, 17}, {—9, 13}}, 
Axes— True, AxesOrigin-{—10, —9} ] 


Vectors in mathematics are very useful for doing geometry, so we need to 
be able to define the angle between two vectors. The angle 0 between two 
nonzero vectors G and 7 is the angle between 0 and z radians that is the 


smallest angle formed by placing the arrows representing Ti and 7 at the 


same starting point. Again, we illustrate this idea with the help of 
Mathematical as depicted in Figure 6.8. 


Figure 6.8: The angle 0 between two vectors. 


282 


V = {-10, 17}; W = {35, 8}; 

ArrowPlots7 = Graphics[{Arrowheads[.07], Thickness[.010], Black, 
Arrow[{{0, 0}, V}], Blue, Arrow[{{0, 0}, WHH; 

ArcPlot = Graphics{[{Red, Thickness[0.015], Circle[{0,0},7, {ArcTan 
[8/35], r—ArcTan[17/10]}]}]; 

TxtPlots7 = Graphics[{Black, Text("@", {1, 3}], Text["V", {—5, 12}, 
Text("W", {20, 2.5}]}]; 

Show[ArcPlot, ArrowPlots7, TxtPlots7, Axes—True, AxesOrigin— 
{—13, 0}] 


We shall be able to compute the angle between two vectors Vv and 7 


using the dot product of two vectors that is defined in Section 6.2. We 
conclude this section with some useful properties of both vector addition 
and scalar multiplication. 


283 


Basic Vector Addition and 
Scalar Multiplication Properties 


Vector V+V=V+7 

addition (R+T) =+ +T) 
v+(-v)=v-7 
?+(-v)=0 
@+0=7 

Scalar a(v@+V)=avt+av 

multiplication | (a +b) =av +bv@ 


(ab)u =a (bt) 
1X =t 
ov=0 

law| = |a| 2| 


Homework Problems 


1. Given 7 = (1, 3), W = (5, -7), and yf = 4, 5) perform the 
following operations: 


(a) -7 (b) 4t +3 (c) -67 -27 
(a) +t- w (e) 20-37 +6w (f) -3w +27 -TU 


(g) av-—Bv (h) at—OBV¥+ 70 (i) avd+pr 


2. Given gp = C1, 3, 2), W = (5, 0, -7), and gy = 4, -2, 2), 
perform the following operations: 


284 


(a) 3%+27 (b) 6? -27 (c) 77 +37 
(d) 2 -7+3 (e) -5V +2w -37 (f) -6w -vY +8t 


(g) at +87 (h) a-p? +yÈ (i) at+e7v 
3. Given v = (-2, 3) and YT = (1, 2) find values of a and £ for each 
of the following vectors 7 so that GF — au A BT 


(a) (1,3) (b) (=1,2) (e) (-5,6) 


(d) (-1,5}) (e) (4,4) (f) (-8,12) 


4. Given 7f = C1, 1, 0), xf = (1, 1, 0) and FF = (0, 0, 1), find values 
of a, p and y for each of the following vectors P. so that 


=at +6 +yu: 
(a) (1,3,-1) (b) (-1,2,4) (e) (-5,6,—4) 


5. Given v = I, 1, 1) and v = (1, 0, 2), find values of a and £ (if 
possible) for each of the following vectors 7 so that 


w=at + fv: l 
(a) (—5,2,4) (b) (1,4,14) (e) (0,0,1) 


(d) (0, 1, 2) (e) (1,4, 1) (f) (—1, 2, 4) 


6. For each of the vectors 7 of problem 5, construct a matrix A € 
R? whose columns are the vectors IÈ v and g}, then compute 
det(A). 
7. Interpret your results from problem 6. 
8. Let 7 = (a, b) € R. The slope of 7 is my = b ifa#0 
and otherwise it does not exist. 
(a) Explain why = (a, b) and yf = b, a) are perpendicular 
(or orthogonal) vectors. 
(b) Let FF € R? Explain why 7 and 7 are parallel exactly 
when Mẹ = Mż, or both slopes do not exist. 


285 


9. Find and plot the vector 7 with the original vector F: 


(a) A unit vector 7 in the opposite direction to G = (2, 5). 
(b) A vector 7 of length 7 in the same direction as GF = (4, 


-7). 

(c) A vector 7 of length 10 in the opposite direction to a = 
(1, 6). 

(d) A vector 7 of length 3 parallel to WT = C4, 8) starting at 
PO, 8). 


(e) A vector 7 of length 13 perpendicular to WV = (7, -3). 


(f) A vector 7 of length 2 perpendicular to vy = 4, -8) 
starting at P(3, 10). 
(g) A vector 7 of length 8 parallel to the line 4x + 7y = 10. 
10. Let # = (2, 5) and a = (7, 3) be two adjacent sides of a 
triangle. Find the length of the third side of this triangle using the 
length of a vector. 
11. Using trigonometry, find the angle between the two vectors 7 
= (-2, 5) and Wf = (7, 3). 
12. Find the distance between the two points P(1, —4, 7, 0, 2), O(-9, 
3, 5, —6, 8). 


Mathematica Problems 


1. Graph each set of three vectors from homework problem 3. 


2. Graph each set of three vectors from homework problem 5. Is it 
easy to see from the visual depiction of the sets of vectors why your 
answers to homework problem 5 are what they are? 


3. Find and plot the vector 7 with the original vector an: 
(a) A unit vector 7 in the opposite direction to 7 =, 5,9) 
(b) A vector 77 of length 7 in the same direction as GF = (4, -7, 
3) 
(c) A vector 7 of length 10 in the opposite direction to 7 = 
(1, 6, 4) 


286 


(d) A vector 7 of length 3 parallel to G = (4, 8, —5) starting 
at the point P(5, 8, 2). 
4. Using trigonometry, find the angle between the two vectors Wf = 
(-2, 5, 7) and 7 = (6, —3, 9). Plot both of these vectors. 
5. Find the lengths of the following vectors: 


(a) (—2,3,—5,7,0,4) (b) (—9, -5, 2, —7, 1, —3, 4, 0, 6) 


6.2 Dot Product 


We spent the entire last section developing the basic arithmetic concepts 
of addition, subtraction, and scalar multiplication for vectors, which also 
gave rise to linear combinations of vectors. The next concept from 
arithmetic that we wish to develop for vectors is “multiplication” of two 
vectors. The questions remain: How and what is the end result? One form 
of vector multiplication is known as the dot product, which takes two 
vectors and returns a real scalar. 


Definition 6.2.1. The dot product of two n-dimensional column vectors 
Wa € R” is a real number, denoted 7? ` 7, and defined in terms of 
matrix multiplication by 7 ` 7 = a a More specifically, if 7 = (1, 
V2,..., Vn) and Ff = (w1, W2,..., Wn), then 


(6.2) 


As we will discover shortly, the dot product of two vectors is a very 
important concept since it incorporates, in a single construction, the two 
ideas of the length of a vector and the angle between two vectors. Since 
the dot product 7? - 7 of a vector 7 with itself is the squared length, | 


ari. of W, the dot product is actually a generalization of the basic 


287 


geometric concept of length. Amazingly, the dot product of two vectors 
also allows us to determine the angle between them, and so the dot 
product encompasses all of basic geometry, since it is the embodiment of 
both length and angle. 


It should be fairly clear from this definition of dot product that it has the 
following properties: (1) it is symmetric in that 7 - yt =W ` Y, and (2) 
we have the length property Ff © Y = TÊ which leads us to the 
following definition: 


Definition 6.2.2. The length (norm, magnitude) of a vector Ff € R”, 
denoted |77| is given, in terms of the dot product, by the formula 


(6.3) hg = VY.: v 


One more important property of the dot product is that it is distributive, in 
the sense that for any real numbers a and b and any vectors $ a, and Ti 


in R”: 
6.4) Y- (at +bw) =a(V-W)+6(7 R) 


The identity given above is simple to show, and even simpler if you 
consider the dot product as a matrix multiplication, and we know that 
matrix multiplication is distributive. 


Example 6.2.1. Similar to matrix multiplication, Mathematica uses the dot (.) to 
represent the dot product of vectors. We will now use it to find the dot product of the two 


vectors Gf =(5,—1, 2, 3) and Ft = (-4, 0, -5, 7): 
V = {5, —1, 2, 3}; W = {—4, 0, —5, T}; 
V.W 
—9 
Let us now see how the dot product of two vectors is related to the vector 
lengths of each and the angle 0 between them for vectors of dimension 


two or three. The angle 0 between a pair of two- or three-dimensional 
vectors Gf and 7 is the angle between 0 and m radians obtained by 


288 


placing them as arrows with the same base point and rotating one of them 
to the other through the smallest possible positive angle. We will also 
generalize this concept to define the angle between any two vectors of the 
same dimension. 


The key to seeing this relationship is writing out the law of cosines for the 
triangle formed by the three vectors 7, yf, and 7 — qt, using the angle 0 


between 3 and IR. Applying the Jaw of cosines to this triangle, we get 


65) 18 ~ BP = IPP +18? - 217 |B] cos() 


For simplicity, we will next assume that the vectors are of dimension two 
so that E7 = (v1, v2) and 7 = (w1, w2), in terms of components. Then 


|v- wl? =? = a |? = (vy — wy)” + (v2 — we)? — (vj + vg + wy +w) 
= —2 (vw, + v2w2) 
Now putting this into equation (6.5) gives 
-2 (viw + vw) = —2 |F| |3| cos(A) 
or 


vw + vzw = |T| |W | cos(8) 


The LHS of this equation is simply the dot product of %7 and I so we 
obtain 


é6 Y -w = |V| |3| cos(A) 


This formula now tells us that taking the dot product of two two- or 
three-dimensional vectors is a way of incorporating length and angle for 
these two vectors into a single quantity. We now arrive at the following 
definition. 


Definition 6.2.3. The general angle 0 between any two nonzero vectors 
y and a of R” is given by the formula 


289 


T- w 
0 = cos! =, - ) 


Once again, note that equation (6.7) is the definition of the angle 0 
between any two nonzero vectors in any dimension, instead of just the 
dimensions two or three where we can picture things geometrically. Recall 
that cos |(x) is the unique angle 0 between 0 and m radians for which 
cos(@) = x for any x e [-1, 1]. This tells us that our general angle 
definition will work only if | 3 - 7 < (7| |g, for any two vectors FF 
and 7 in R”. This inequality is called the Cauchy-Schwarz inequality and 
you are asked to prove it in homework problem 5. 


Example 6.2.2. As an example, we draw the triangle (Fig. 6.9), with adjacent sides the 
vectors and TŽ. With its altitude from the tip of the vector Ff = (12, 19) to the base 


vector a = (35, 0): 
Figure 6.9: Vectors %7 and 7 and the angle between them. 


V = {12, 19}; W = {35, 0}; 

ArrowPlots = Graphics[{ Arrowheads|[.10], Thickness{.010], Blue, Ar- 
row[{{0, 0}, V}], Black, Arrow[{{0, 0}, W}], Red, Arrow[{W, W + 
(V — W)HH; 

LinePlot = Graphics[{Thickness{[.005], Black, Line[{{12, 0}, VHH; 
TxtPlots = Graphics[{Black, Text["V", {5, 10}], Text["W", {16, 
—1}], Text["V—W", {23, 12}], Text["H", {13.5, 10}], Text["@", {4, 
2})}]5 

RightAnglePlot = Graphics[{Black, Rectangle[{12, 0}, {14, 2}]}]; 
ArcPlot = Graphics[{Black, Thickness[0.010], Circle[{0, 0}, 7, {0, 
ArcTan[19/12]}]}]; 

Show[ArcPlot, LinePlot, ArrowPlots, TxtPlots, Right AnglePlot] 


V-W 


290 


The dot product formula can also be used to define the area of the triangle 
in Figure 6.9 with sides given by the vectors 7, uł and G — 7. The area 


of the triangle is given by A = 5 b h, where A is the length of our altitude 
and b is the length of the base. In our situation, we have Æ = 5 BRITY 
|sin(@), since b = |y| and A = |W Isin(0). Squaring this equation gives 


A? = s IZI? 7 |? sin(0)? 
= : (wl? 71? (1- cos(@)) 


= 1 (ra? re? - ct - ay) 


Finally, we arrive at the formula for the area of the triangle in terms of the 
dot product: 


1 
(6.8) : 


Notice that in order to find the area of this triangle, all we need are its two 
adjacent side vectors Gf and 7. 


Example 6.2.3. Let us now do an example of using our dot product formula to find the 
three interior angles and the area of the triangle with vertices at P(—8, 3, 7), Q(10, —6, —9) 
and R(30, 13, 25), which is depicted in Figure 6.10. We will compute all three angles 
directly from our formula and then verify that they add up to z. As we proceed, pay 
special attention to how we compute the magnitude of a vector in Mathematica using the 
Norm command. The concept of taking the norm of a vector is much more general than 
what we have discussed here; hence there are many different ways to compute the norm 
of a vector: 


Figure 6.10: Triangle formed by the points P, O, and R 


291 


P = {-8, 3, 17}; Q = {10, —6, —9}; R = {30, 13, 25}; 
TrianglePQR = Graphics3D/[{Blue,Opacity[.5],Polygon[{P,Q,R}]}]; 
ArrowPlots = Graphics3D[{Arrowheads[.1], Thickness[.015], Red, 
Arrow[{P, Q}], Black, Arrow[{P, R}], Blue, Arrow[{R, Q}]}]; 
TxtPlots = Graphics3D[{Black, Text["PQ", {—3,—2,4}], Text["PR", 
{11, 8, 22.5}] , Text["RQ", {22, 4, 8HH; 

Show[TrianglePQR, ArrowPlots, TxtPlots, Axes—True, ViewPoint 
—{17, —25, 20}] 


A 


1a | it 
a t 
“5.2 


PQ=Q-P 
{18, —9, —26} 
PR=R-P 
{38, 10, 8} 
RQ=Q-R 


{—20, -19, —34} 


292 


AngleQ = ArcCos{N[(PQ.PR)/(Norm[PQ] Norm[PR])]] 
1.27367 

AngleR = ArcCos[N[(PQ.RQ)/(Norm[{PQ] Norm{RQ])]] 
1.06696 

AngleP = ArcCos{N|[(RQ.—PR)/(Norm[{RQ] Norm{PR]})]] 
0.800967 

AngleQ + AngleR + AngleP 

3.14159 

.5 Sqrt|Norm{PQ]? Norm[PR]? — (PQ.PR)?| 
630.328 


As expected, the sum of these three interior angles of the triangle is z, and is easily 
computable from the dot product formula once we have three vectors that define the sides 
of the triangle. We have also used equation (6.8) to compute the area of the triangle. 


Mathematica does have a built-in command to find the angle between two real- or 
complex-valued vectors. We reproduce the angle calculations from above using the 
Vector Angle command: 


VectorAngle[PQ, PR] 


ArcCos| = | 


v 434562 
N[VectorAngle[PQ, PR]] 
1.27367 


N[VectorAngle[PQ, RQ]] 
1.06696 


293 


N[VectorAngle[RQ, —PR]}] 


0.800967 


Following in this train of thought, notice that if we return to our dot 
product expression (6.6) 


Yu P| |v f| cos(@) 


we can also easily determine when two vectors are perpendicular (or 


orthogonal) to each other. Remember that if two vectors are orthogonal, 


then the angle 0 between them must be $ Putting these two pieces of 


information together tells us that the two vectors G and 7 are orthogonal 
exactly when 7 ` a = 0. This is an extremely convenient way to test for 


the orthogonality of two vectors as well as a useful tool to create a vector 
orthogonal to a given one. 


Example 6.2.4. Starting with rr 4 = (4, 5, 11, -3), we will find two vectors, uv and am 


orthogonal to E7 by switching components of 7 and altering signs: 


V = {—4, 5, 11, —3}; U = {5, 4, 3, 11}; W = {11, 3, 4, 5}; 
{U.V, V.W} 


{0, 0} 

U.W 

134 

N[ArcCos[U.W/(Norm[U] Norm[W})]] 
0.670315 


So the pairs v and 7 as well as 7 and uv are orthogonal, while 
the pair 77 and 7 is not. 


294 


So far, the dot product has been explicitly defined for vectors with real 


components. We now generalize this to vectors in Cc" with complex 


components, but first we formally define the vector space { A 


Definition 6.2.4. The vector space ¢~" is the set of vectors 7 written as 
(q1, q2,..., Gn) With n components q1, q2,..., Gn 


In this definition of the complex vector space ¢~ " note that the scalars are 
complex numbers. For the real vector space JR”, the scalars are real. 
However, the same vector space properties of addition and scalar 
multiplication hold in E as they do in RẸ”, just for a different set of 
scalars. 


The complex dot product on ¢ ” is designed to be the same as the real dot 
product on R” (since R” c Ç”) so that when we use it on a pair of real 
vectors it agrees with the real dot product. Moreover, the complex dot 
product of a vector with itself should still give a nonnegative real number 
so that F Wf = arr for v € Cc" can define the length |7} |, where it 
must also be true that |7| = 0 if and only if 7 = T The complex dot 
product on ¢ " is the simplest extension of the real dot product on RẸ” that 
behaves in this manner. 


Definition 6.2.5. The dot product of two complex valued n-dimensional 
column vectors 7 and x is a complex number, denoted U 7, and 
defined in terms of matrix multiplication by 7 ` ZV = a pi where z 


denotes the conjugate vector of 3. More specifically, if 
v= (a, + iby, a2 + ibe,..., an + ibn), Y= (cı + idı, c2 + id2,...,Cn + idn) 


With ak, bk, Ck, and dk values all real, then 


295 


(6.9) 


cı — id, 

C2 — idz 
L- Y = | a +ib ag+ibg +- an + ibn | 

Cn — idn 


= 9 (ak + ib) (ck — idx) 
k=l 


Notice that the preceding definition of the dot product does indeed reduce 
to the already given formulation if uv and 3 are both real. However, one 


surprising result is that 7.9 = y.u Furthermore, when 


computing the norm of a complex vector, notice that it will always end up 
being real-valued: 


t=- 
=J (ap + iby) (ap — ib) 


k=1 


=} az + be) 


k=1 


a 


Example 6.2.5. We let Mathematica do one example for us, choosing the following two 
vectors in C ay 4 = (2 ii 4: 1— Si, j)and 
V = (2— 5i. 6 + Ti. —3) Wewilcomue ye Fam af 


|, making sure that the properties of the complex dot product just described do indeed 
hold: 


u = {2 — I, 1 — 3 I, I}; v = {2 — 5 I, 6 + 7 I, —3}; 
u.v 


26—26i 


296 


v.u 

26—26 i 

u.Conjugate[v] 

—6—20i 

v.Conjugate[u] 

—6+20 i 

Norm{[u] 

4 

Sqrt{Sum[u[[k]] Conjugate[u[[k}]], {k, 1, Length[u] }]] 

4 
We conclude this section with a table of dot product properties. Note the 
subtle differences between the real and the complex vector cases. It was 
shown previously that 7 . Gf = W . af for complex-valued vectors 77 


and >. However, a more subtle difference can be seen in the very last 


property. For the real-valued case, a and b are real scalars and can be 
moved around in the dot products in any way we wish. However, in the 
case of the dot product of two complex-valued vectors, where a and p are 
complex scalars, to pull the scalar 6 (which is multiplied by the second 
vector in the dot product) out of the dot product, we must take the 
complex conjugate of £. 


297 


Dot Product Properties 


Aor dena gaa i 
a ee oe he 
2 


Real dot product 
Y, Y, W ER” 
a,bE R 


:7=0 
(av)- v7 =al- T) 
ad - (bv) =b(7 - T) 
Complex dot product | 7. Y = Ved 
R, Y, vec" te 
a,BEC (+T) w=- 7+7- P 
@-0 =0 
(at): Vv =a- T) 
T- (BV) = B(wt- Vv) 


Homework Problems 


1. Compute the dot products of the following pairs of vectors: 
(a) (—5,2), (3, —2) (b) (1,4), (-6,3) 
(c) (-2-4,1), (3,-2i)  (d) (3,2,—2), (4,3, -2) 
(e) (1,0,4), (2, 6,0) (£) (4,2,5), (1,3, —2) 


2. Compute the norms of the following vectors: 


(a) (—5,2) (b) (3,3) (c) (5,3 — 2%) 
(a) (1,0,-2) (e) (4,0,4) (£) (3-6i,-42+i) 


(æ) (1,2,-1,1) (h) (-1,2,5,3,2) (i) (1-i,8+2i,342i,1-3) 


298 


3. Determine the angle between the following pairs of vectors: 
(a) (—3,6), (4,2) (b) (1,—4), (-2, 2) (c) (—2,1), (2,-1) 
(d) (1,0,—2), (0,3,0) (e) (3,—2,4), (2,6, 2) 


4. For each of the following vectors 7], find a second vector 7 
such that 7 L 7: 


(a) (2,3) (b) (5,—1) (c) (1,0,5) 


(d) (2,1,3) (e) (7,0,—2,2) (f) (—2,3,1,5) 


5. Prove the Cauchy—Schwarz inequality, which states that if IÈ 7 
€ R”, then 


[w+ V|<|zl|+/?| 


6. Use the Cauchy—Schwarz inequality to prove the triangle 
inequality, which states that for Te y R” 


[e+ V|<lel]+|?| 


(Aint: Start with the fact that 
[a + vl’ = (WÙ + W)- (a + Y) and expand the righthand 
side using the distributivity of the dot product.) 


7. Find the area and the interior angles of the triangle with the two 
adjacent sides @ = (-5, -9) and qf = (7, 3) 


8. Find the area and the interior angles of the triangle with the two 
adjacent sides = (-5, —9, 12) and yt = (7, 3,-6) 


9. Consider the following two vectors in Ç 4, 
Y = (-5 + 4i, —9 — i, 12, 7i), W = (7 — 2i,3,—6 + 5i, 1 + i) 


For parts (c) and (d), you may wish to refer back to problems 5 and 
6, and make use of the triangle and Cauchy—Schwarz inequalities. 
(a) Compute F ` yt and yt: T- 
(b) Find the norms of the two vectors @ and 7. 


299 


(c) Is it true that Ei +| < v] 4 |3]. 

(d) Is it true that [af - W| < |V| |7. 
10. Find a vector 7 of norm 7 perpendicular to 7 =O5, -9). 
11. Find two different vectors 3f of norm 4 perpendicular to 7 = 
(7, 3,6). 
12. Find two nonzero vectors wi and we perpendicular to 7 = (7, 
3,—6), where wt and we are also perpendicular. 
13. Let 7? = (a, b,c) € R be fixed. What equation must 7 = (x, y, 
Z) g R satisfy for q? and 7] to be perpendicular, and what does 
this equation represent in R H 
14. Find the equation of the plane in R that is perpendicular to the 
vector yf = (7, 3, —6) and goes through the origin (0, 0, 0). 
15. Find the equation of the plane in R that is perpendicular to the 
vector q = (7, 3,—6) and goes through the point P(-2, 5, 8). 
16. Use the dot product to show that the diagonals of a square are 
perpendicular. 


17. Find a formula for the angle between the two diagonals of a 
parallelogram in terms of the lengths of any two of its adjacent 
sides. 


18. Let f(x) and g(x) be any two real continuous functions for x € 
[a, b]. Define the dot product of f(x) and g(x) by 


b 
f(x) - g(x) -| f(x) g(x)dx 


(a) Find f(x) - g(x), Ax) - fx) and g(x) « g(x) for fx) = e* and 
g(x) = cos(x) on the interval [0, 7] 


(b) Show that this dot product of two functions satisfies the 
usual properties of a real dot product. 


(c) Find all possible dot products of the functions 


{1,cos(x), cos(2x), sin(x), sin(2a) } 


with each other and themselves on the interval [0, 27]. Do you 
see a pattern here, and if so, what is it? 


300 


(d) Find all possible dot products of the functions 


{1,2,2°,2°3 
with each other and themselves on the interval [—1, 1] 

19. Let A be a real n x n matrix with 7, 7 € R” written as column 
vectors in R”. Show that the real dot product satisfies 
(AÑ) r gear ae (A’ vw): Give an example when n = 2 to 
illustrate this formula. 

20. Let A be a complex n x n matrix with 7, a Ç” written as 
column vectors in ¢~ ”. Show that the complex dot product satisfies 
(AÑ) 4 y = ry 4 i (a T) Give an example when n = 2 to 


illustrate this formula. 


Mathematica Problems 
1. Compute the angle between each of the following pairs of vectors 
7 andy 
(a) V =(—5,2), W = (3,—2) (b) VY = (1,4), W = (-6,3) 


(c) Y = (-2,3), Ñ = (5,0) (d) X = {(6,—2)}, W = (-7,-9) 


2. Starting with each of the vectors 7 from problem 1, use rotation 


matrices and scalar multiplication to transform them to the vectors 
I. Treating the vectors as column matrices, you should be able to 


express each vector 7 as 


a = rAgt 


3. Graph each pair of vectors from problem 1. 
4. Compute the angle between each of the following pairs of vectors 


y and 7: 


(a) 7 =(1,-5,0), X =(0,5,2) (b) W=(1,0,4), W = (-6,3,4) 


5. The angles between each pair of vectors in problem 4 do not 
necessarily correspond to an angle in the xy-, xz-, or yz-plane. Can 


301 


you come up with a process similar to that of the two-dimensional 
case in which the initial vector GZ was rotated and scaled 


appropriately to become the vector #? You may wish to refer to 


homework problem 10 from Section 4.4, where the 3x3 matrices for 
rotations about the x-, y-, and z-axes were given. 

6. Let P(-1, 3), Q(2,-5), R(7, 11) and 7(1, 15) be the 
counterclockwise vertices of a quadrilateral. 


(a) Find all four interior angles of this quadrilateral and the 
lengths of all four sides. 
(b) What is the sum of these four interior angles? 
(c) Find the area of this quadrilateral. 
7. Find the area and the three interior angles for the triangle with 
vertices at P(—2, 7, 3), O(8,-10, 4), and R(6, 1, 9). 
8. Plot the results of homework problems 14 and 15, along with 
their corresponding perpendicular vectors. 
9. Verify your answers to homework problems 18 (a), (c), and (d). 


10. Give examples when n = 4 to illustrate the formulas of 
homework problems 19 and 20. 


6.3 Cross Product 


As discussed in Section 6.2, the dot product of two vectors resulted in a 
scalar. The next vector operation that we will consider is the cross 
product, which can be applied only to vectors in R . The cross product of 
two vectors in R denoted 7 x 7. is designed to produce another vector 


in R? that is orthogonal to both a and 7. 


Definition 6.3.1. The cross product of two vectors wv = (ul, u2,..., Un), af 
= (V1, V2,..., Vn) in R is denoted uv x Gf and is computed by the 
following formula: 


302 


ij ik 
T x Y = det uy ug u3 
Ui: %2: U3 


+ + =} 
(ugv3 — vgu3) i — (u1v3 — vius) j + (uve — viuz) k 


= (ugv3 — UgU3, —U1 Ug + V1 Ug, U12 — U1 U2) 


In this formulation, notice that the determinant was expanded along the 
first row of the matrix, while 7 and 7G make up the second and third 


rows of the matrix, respectively. 


The key to defining the cross product is realizing the following two facts: 
(1) the determinant of a square matrix is 0 if two rows are the same, and 


(2) if we write # = (51, 82,..., Sn) and Ë = ty re +tə 7 ns EE: where 
FG -= 1 6 0. F = © 1, 0) and F = 0, 0, 1), then 
ron ra = t481 + t282 + tgsq Where the three unit vectors involved in 
writing out z have been replaced by the three components of the vector # 


Theorem 6.3.1. The vector uv x Wf is orthogonal to both of the vectors wv 
and W. 


Proof. Note that for any vector # = (s1, 82, 53), we have 


By setting # = wv or g or 7 in the formula above, note that the resulting 


matrix will have a repeated row, yielding a determinant of 0. Therefore, 


both 7: È * W) =0and F - (7 7) = 0, and we can conclude that 
ries 7 is orthogonal to both 7 and 7. 


One of the simplest properties to see of the cross product is that 


6.10) Y * ¥ =—(% x 2%) 


303 


This is due to the fact that swapping v and 7 in the matrix used to 


compute the cross product can be achieved through one multiplication by 
a type I elementary matrix, which changes the sign of the determinant. 


Example 6.3.1. Again, we need an example to illustrate these ideas. Let us find the cross 
product off = (-3, 7, 11) and a = (6,-1, 9) and also test that uv x 7 is orthogonal 
to both TA and 7 by using the dot product. We shall also plot all three vectors: 


U = {-3, 7, 11}; V = {6, —1, 9}; 
Det[{{"I", "J", "K"}, U, V} 


741+93 J—39 K 
Cross[U, V] 
{74, 93, —39} 
Cross[V, U] 
{—74, —93, 39} 
Cross[U, V].U 
0 

Cross[U, V].V 
0 


W = {18, —10, —15}; 
Det[{W, U, VH 


987 


304 


W.Cross[U, V] 
987 


Now we plot the vectors 7, y and i@ x Ff) in order to scale 
back the length of 7? x 7 so it is close to the lengths of yf and VY 


Figure 6.11: The cross product yields a vector orthogonal to both 
T and y 


ArrowPlots = Graphics3D[{ Arrowheads[.07], Thickness[.012], Black, 
Arrow[{{0, 0, 0}, U}], Blue, Arrow[{{0, 0, 0}, V}], Red, Arrow([{{0, 
0, 0}, 0.2 Cross[U, V]}]}]; 

TxtPlots = Graphics3D[{Black, Text[{"U", {—3, 4, 9}], Text["V", {6, 
—1, 10}] , Text["U x V", {10, 10, —5}}; 

RightAnglePlots = Graphics3D[{Black, Polygon[{{0, 0, 0}, {-3/4, 
7/4, 11/4}, {-3/4 + 74/40, 7/4 + 93/40, 11/4 — 39/40}, {74/40, 
93/40, —39/40}}], Black, Polygon[{{0, 0, 0}, {6/4, —1/4, 9/4}, {6/4 
+ 74/40, —1/4 + 93/40, 9/4 — 39/40}, {74/40, 93/40, —39/40}}]}]; 
Show[ArrowPlots, TxtPlots, RightAnglePlots, Axes—+True] 


From Figure 6.11, we should recognize that the cross product satisfies the 
right-hand rule, which states that if you place your right hand so that your 
fingers bend from TA to ne then your thumb will point in the direction of 


ux. 


As stated previously, vectors have two properties: direction and 
magnitude. We have the direction of the new vector, which is 


305 


perpendicular to both wv and ne We next determine a formula relating the 
magnitude of 7 x 7 to the magnitudes of 7 and 7. If 7 = (u1, u2, u3) 
and Vv = (V1, v2, v3), then 


W x T = (ugus — vug, — (uiv3 — v1Ug) , uiv — vuz) 
and 
2 
|X x T| = (uzv3 — vous)” + (uva — viua) + (uiv — viu)” 


We will now use Mathematica to perform a few algebraic manipulations 
to help prove the identity: 


(6.11) lj? x P? =P? -y 


U = {u,, u2, u3}; V = {v1, v2, V3}; 
Expand[(uz vs — v2 us)? + (u1 v3 — vı us)? + (ui v2 — vı U2)?] 


uv? +ugv? —2u uzvi v2 +u? v2+uŽv—2u; gv v3—2u2U3V2V3+u?v+uŽv 
NormU = U.U 

u}-+u3+u4 

NormV = V.V 

Vitva+v3 

RealDotUV = U.V 

Ui Vi +U2V2+U3 V3 


Expand[NormU NormV — RealDotUV?] 


uży? +uĝv? —2u)Ugv) votu7v3+urv2 —2u,UgVv] v3—2UgU3V2v3+U7v4+usV5 


306 


Now we know from this last Mathematica calculation that equation (6.11) 
holds true. With only a simple substitution of equation (6.6), 7: y= zil 


7 |cos(A), into equation (6.11), we arrive at 


IÈ x TI? = |e? |? (1 — cos?(0)) 
= Kik I?I? sin? (0) 


Now we have found an expression for the norm of 7 x 7 involving the 
magnitudes of 7, 7, and the angle 0 between them: 


(6.12) [2 x T| = [T|X] sin(4) 


To interpret this, note that the maximum magnitude of uv x Ff occurs 
when the vectors are orthogonal, i.e. 0 = F and thus sin(@) = 1, yielding a 
magnitude of IT. As the vectors TA and a become closer to lying on 


the same line (6 = 0 or 6 = a), the magnitude of their cross product tends 
to 0. 


Next we turn to an application of the cross product. Similar to the dot 
product, the cross product can give us a very nice way of computing the 
area of a triangle, albeit in R or R only. Since the cross product only 
applies to vectors in R , if the triangle is in R?, adding a third component 
with value 0 to each vector in order to move it to the xy-plane in R will 
allow the following computations to hold. From Section 6.2, we know that 
the area A of a triangle in R with two adjacent side vectors G and 7 is 


given by (6.8): 


We just proved (6.11), whose RHS is the expression under the radical of 
(6.8). Therefore, we now have an expression for the area of a triangle 
involving the cross product: 


Buc [wv x w| 
(6.13) 


307 


This area formula is quite simple and straightforward to compute; 
however, as discussed, it applies only in dimensions two and three, as 
opposed to the previous formula with dot products that applies in any 
dimension. 

Example 6.3.2. Now let us use all of our information, including equations (6.8) and 

(6.13), to test each other’s correctness for computing the area of the triangle in R with 

two adjacent side vectors v =(9,-5,-13) and Fe = (4, 7, 2) 


V = {9, —5, —13}; W = {—4, 7, 2}; 
VxW = Cross[V, W] 
{81, 34, 43} 
Areal = 1/2 Norm[VxW] 
4783 
2 
Area2 = 1/2 Sqrt|Norm[V]? Norm[W]? — (V.W)?| 
4783 


2 
@ = N[ArcCos[(V.W)/(Norm[V] Norm[W})]] 
2.35206 
N[Norm[VxW]] 
97.8059 
LenghVxW = Norm[V] Norm[W] Sin[@) 
97.8059 


The calculations above verify the correctness of all of our formulas 
involving the dot and cross products. Finally, we include a small table of 


308 


cross product properties, including two relationships involving both the 
dot product and cross product. 


Cross Product Properties 


xT =-(T xT) 
Ux (04+0)=(¢ x V)+ (7 x DB) 
(av) x @ =T? x (av) =a(w@ x VW) 


(nie a a, 
x(V xw)=(-w)Y -v 


Basic properties 


Cross and dot 
products 


Homework Problems 
1. Given 7? = (1, 2-1), W? = (-3,-1, 4) and g = (5,-1, 0), compute 
the following. As defined previously re = (1, 0, 0), 7 = (0, 1, 0), 
and Ẹ = (0, 0, 1) are the three standard unit vectors o of R. 
(a) 7x? (b) wx (c) @xV¥ 


(d) wx? (e) (Wx W)x@ (f) Wx (vx @) 


(e) Px? (h) xE ü JFxa 


2. Compute the areas of each of the triangles defined by the 
following sets of points: 


(a) {(0,1), (2,3), (6,2)} (b) {(0,1,2), (2,3,1), (1,6,2)} 


3. Prove the following properties of the cross product: 


(at) TATEK b) F7xR=F? © Ex?=7 
(d) fx?=-F @) Rx f= O Px k=-7j 


4. Given 77, Wa é R and a € Ẹ verify the following identities, 
(Property (c) is an example of a Jacobi identity): 


309 


(a) @x(V¥+)=(%x V)+(0x @) 
(b) (av)x T =T x (av) 


(c) @x(%x B47 x(x V)+0 x (7x F)=0 


5. Given v(t) = (cos(t), sin(t), 0) show that the angle 0 
between u(t) and rs satisfies sin(0) = |sin(A|, and that 


loe x = = sin? (t)- 


6. Verify the two cross product and dot product properties in the 
table at the end of the section. 
7. (a) Let ax + by + cz =d be the equation of a plane. Show that for 
any two points P and Q in this plane, the displacement vector PO is 
perpendicular to E7 = (a, b, c); that is, E7 = (a, b, c) is 
perpendicular to the plane ax + by + cz = d 
(b) Let ax + by + cz = d and ex + fy + gz = h be two 
intersecting planes. Find the equation of the plane through the 
origin perpendicular to these two given planes. 
(c) Let 5x + 2y + 7z = -9 and 3x — 4y + 11z = 1 be two 
intersecting planes. Find the equation of the plane through the 
origin perpendicular to these two given planes. Find the 
equation of the plane through P(-1, 0, 4) perpendicular to these 
two given planes. 


8. (a) Let x = at + a, y = bt + P, z = ct + y be the parametric equation 
of a line in space for t € RẸ. Show that for any two points P and Q 
on this line, the displacement vector PÒ is parallel to E7 = (a, b, ©, 
that is, that 7? = (a, b, c) is parallel to the line x = at + a, y = bt + £, 
z= ct + y. Note that when ¢ = 0, we have that (a, f, y) is a point on 
this line. In addition, this parametric equation for a line in space is 
the spacial version of the point-slope formula for a line in the 
xy-plane. 

(b) Let px + qy + rz = s and ex + fy + gz = h be two 

intersecting planes. Find a vector parallel to their line of 

intersection. 


310 


(c) Let 5x + 2y + 7z = -9 and 3x — 4y + 11z = 1 be two 
intersecting planes. Find the parametric equation for their line 
of intersection. 
9. Using problem 7, find the equation of the plane through the three 
points P(1, 5, 9), O(-3, 4,—8), and R(7, —2, 6) 
10. (See problem 8.) (a) Let x = at + a, y = bt + p, z = ct + y and x 
dt + ô, y = et + 0, z = ft + 4 be two intersecting lines in space. 
Find a vector perpendicular to the plane through these two points. 


(b) Find the equation of the plane through the two intersecting 
lines x = 5t + 2, y = —3t + 1, z = 4t — 5, and x = -7t + 2, y = 2t 4 
1,z=9t-5. 
11. Explain why the associative property of the cross product is 
false; that is, explain why 


x(x T) xT) E 


Give a general example to illustrate that this is correct. 

12. Can the cross product be defined for complex vectors in {Ç 32 Jf 
no, explain why not. If yes, then do all of the properties of the cross 
product still hold if we switch to the complex dot product? 


Mathematica Problems 


1. Run the following Mathematica manipulation and explain the 
change in direction and magnitude of the cross product vector 
depicted in red: 


Manipulate[ 

Column[{Show[Graphics3D[{Blue, Thickness[.01], Arrow[{{0, 0, 0}, 
{1, 0, 0}}], Darker[Green], Arrow[{{0, 0, 0}, {Cos[t], Sin[t], 0}}], 
Red, Arrow[ {{0, 0, 0}, Cross[{1, 0, 0}, {Cos[t], Sin[t], OHHH, 
PlotRange—{{—1, 1},{—1, 1},{—1, 1}}, BoxRatios—+{1, 1, 1}, Axes 
— True, AxesEdge->{ Automatic, {1, —1}, {—1, —1}}, AxesLabel-+ 
{"x", "y", "z"}, Ticks+{{—1, 0, 1}, {—1, 0, 1}, {—1, 0, 1}}], Row[ 
{"The blue vector {1,0,0} crossed with the green vector", Chop[ 
N[{Cos[t], Sin[t], 0}]], " is the red vector ", Chop[N[Cross[{1, 0, 
0}, {Cos[t], Sin[t], 0}]]]}], "which has a length of ", Chop[N[Norm 
[Cross[{1, 0, 0}, {Cos[t], Sin[t], 0}]]]]}), 

{{t, 0, "Angle "}, 0, 27, N[2/48]}, SaveDefinitions—+ True] 


311 


2. Given the vector 7 = (0, 1, 1) and the vector function W(t = 
(cos(f), 1 + 2sin(4), —sin(‘)), perform the following: 
(a) Plot the angle 6 between 7 and 7 (0 as a function of t for 
O<t<2n. 
(b) Find any values ¢ such that z} L Y D. 
(c) Plot |W? x W (t)l- 
(d) Find the maximum value of Di x w(t) for 0<t< 2r. 
(e) Plot both Di x W(t)! and O(t) on the same graph and 
compare results. 
(£) Graph the vectors z}, Y (A) and q x FO for 0 < t < 27 
using the code from problem 1. 
3. Given the following vector functions 


W(t) = (sin(t),cos(t),0), W(t) = (cos(t), 2sin(t), —sin(t)) 


perform the following: 
(a) Plot the angle @ between 7t(¢) and 77 (A) as a function of t 
for 0<¢< 27. 


(b) Find any values ¢ such that F(t) L v(t 


(c) Plot |3 (t) x W(t)|- 

(d) Find the maximum value of | w(t) x V(t) for0<t< 2r. 
(e) Plot both | w(t) x T(t) and O(A on the same graph and 
compare results. 

(£) Graph vectors RO, TÀ and FA x FA for 0 < t< 2r. 


Modifying the code from problem 1 may be the easiest method 
for solving this problem. 


4. Do homework problem 7(c). 
5. Do homework problem 8(c). 
6. Do homework problem 9. 

7. Do homework problem 10(b). 


312 


6.4 Vector Projection 


In this section, we are interested in the idea of projecting one vector 7 
perpendicularly onto another vector F. resulting in a third vector in the 
direction of %7. At first glance, this process may appear to be a mental 


exercise only; however, there are many situations in which it can be 
applied. First, we start by letting %7 and a be two nonzero vectors of the 


same dimension that are not scalar multiples of each other, in which case 
they are said to be independent vectors. These vectors thus determine 
different lines, and so together they determine a unique plane that contains 
both of them. We define the vector projection of the vector 7 onto the 


vector %7, denoted pro jy (w). to be the vector that, as an arrow, starts 
at the common starting point of 7 and a s arrows, while stopping at the 
point on 3’s arrow that is perpendicularly below the endpoint of ms 
arrow. 


Example 6.4.1. Before we get into the mathematics behind projection, we will project the 
vector w = (9, 8) onto the vector F =(17, 0). 


Figure 6.12: The projection of 7 onto 7. denoted pro jy (Ñ). 


313 


W = {9, 8}; V = {17, 0}; ProjWontoV = {9, 0}; 

ArrowPlotsl = Graphics[{Arrowheads|.10], Thickness[.010], Blue, 
Arrow[{{0, 0}, W}, Black, Arrow[{{0, 0}, V}], Red, Arrow[{{0, 
0}, ProjWontoV}]}); 


TxtPlotsl = Graphics[{Black, Text("W", {8, 8}], Text["V", {17, 
0.5}], Text["ProjWontoV", {4, —0.75}] , Text["8", {3.25, 1.1}]}]; 


ArcPlot1 = Graphics[{Black, Thickness[0.010], Circle[{0, 0}, 5.5, {0, 


ArcTan[8/9]})}]; 
LinePlots1 = Graphics[{{Thickness[.005], Black, Line[{W, ProjWon- 
toV}]}; 


RightAnglePlot1 = Graphics[{Black, Rectangle[ProjWontoV, Proj- 
WontoV + {1, 1}]}]; 


Show[ArrowPlots1, TxtPlots1, ArcPlot1, LinePlots1, RightAngle- 
Plot1] 


ProjWontoV 


The example above is quite simple. Notice that rf = (17, 0) lies along the x-axis; so, to 


project the vector Tif = (9, 8) onto F- we simply take the x-component of uw since 
the direction perpendicular to v is the y-direction. Therefore, pro jv (w) =(9, 
0) From Figure 6.12, it should also be clear that the vector am - proj (w) is 


orthogonal to F: 
¥ -(w — projy(w)) =0 
Now, we want to get a formula for the vector projection proj>(w) of 
q? onto in terms of arbitrary vectors 7 and 7] of the same dimension. 


On further examination of Figure 6.12, we see that proj+(w) is a 


positive multiple of E7 since its arrow moves along W's arrow in the 


314 


same direction as 9. As a result of this observation, the following 
equality must hold: 


projy(w) _ V 
[proja (W )|) [7] 


(6.14) 


This is true since both vectors in this expression are unit vectors and point 
in the same direction. Solving for pro iv (w) in equation (6.14) gives 


i> (we 
oa- Py 
(6.15) ne CR 


Also, from the right triangle with sides G and a and angle 0 between 
them 


6.16 Proja (@)| = [W| cos(4) 


Remember, magnitude and a direction are the two properties that define a 
vector, and we now have both. By putting these together, we see that 


proj+(w@) = E H cos(0) 7 


This can be simplified even further if we multiply the RHS by [l 
Y 


aI? 
proja) = A costo? 
v| 


The dot product formula states that W. 7 = \w| Ei cos(@); therefore 
we arrive at the final formulation of the projection of 77 onto 


(6.17) 


Note that we could also write (6.17) as 


315 


a- (22) 2 
rojş( w) = 
— vy) TI 
where the vector G is the unit vector pointing in the direction of 7. If 
we compute the unit vector in the direction of 7 first, the projection 
formula becomes much simpler. If we set v= G in the formula above, 


v| 
we have 


(6.18) projy(w) = (wz. T) 


Furthermore, notice that the magnitude of the projection of 7 onto 7 is 


simply 
wv wv) -v| 
j T, = Rte 5 = = 
|projy(w )I we v [7] |7 | 


The expression that lies inside the magnitude signs above is referred to as 
the component of @ onto q , denoted comp- (W ): 


a S 
MMES 
(6.19) v| 


The component can be regarded as the signed magnitude of the projection 
vector. Note that if the vector 7 should point in the direction opposite to 
the general direction of 7, then the formula for computing the projection 
vector proj>(w) of yf onto FZ still holds. The projection is 
performed, not with respect to 7, but instead to the line that 7 


determines. This situation occurs when 6 > $, as Figure 6.13 illustrates. 


Example 6.4.2. Once again we will use the vector 7 = (9, 8), this time projecting it 
onto the vector rf =(-17, 0) 


Figure 6.13: Projection of gł onto ZY with 6> F 


316 


W = {9, 8}; V = {—17, 0}; ProjWontoV = {9, 0}; 

ArrowPlots2 = Graphics{{Arrowheads[.10], Thickness[.010], Blue, 
Arrow[{{0, 0}, W}], Black, Arrow[{{{0, 0}, V}], Red, Arrow[{{0, 
0}, ProjWontoV}]}]; 

TxtPlots2 = Graphics[{Black, Text("W", {8, 8}], Text["V", {—17, 
0.5}], Text["ProjWontoV", {4, —0.75}] , Text["@", {—1., 1.75}]}]; 
ArcPlot2 = Graphics[{Black, Thickness[0.010], Circle[{0, 0}, 5.5, 
{ArcTan[8/9], 7}]}]; 

LinePlots2 = Graphics[{Thickness[.005], Black, Line[{W, ProjWon- 


toV}]}); 

RightAnglePlot2 = Graphics[{Black, Rectangle{[ProjWontoV, Proj- 
WontoV + {1, 1}]}]; 

Show/[ArrowPlots2, TxtPlots2, ArcPlot2, LinePlots2, Right AnglePlot2] 


(+ 


ProjWontoV 


Example 6.4.3. Now let us do a vector projection computation in R , with Y =65, 
9,12) and am = (7, 4, 8) We will compute the vector projection proj+ (w) of 


Tif onto E7 and plot all three vectors. This is illustrated in Figure 6.14. 
ProjWontoV = (W.V)/Norm[V]? V 
ie ih 

10” 50’ 25 


Figure 6.14: The projection process applied to a pair of 
three-dimensional vectors. 


317 


ArrowPlots3 = Graphics3D[{Arrowheads[.1], Thickness[.015], Red, 
Arrow[{{0, 0, 0}, W}, Black, Arrow[{{0, 0, 0}, V}], Blue, Ar- 
row[{{0, 0, 0}, ProjWontoV HH; 

TxtPlots3 = Graphics3D[{Black, Text["V", {—2, 4, —3}], Text("W", 
{4, 3, 3}] , Text{"ProjWontoV", {2, —5, 2} H; 

Show/[ArrowPlots3, TxtPlots3, Axes— True, ViewPoint {—7.5, 3.5, 
4}, AxesLabel—{x, y, z}] 


ProjWontoV 


Vector projections have many useful applications. One of the simplest is 
to derive a formula for the shortest (perpendicular) distance between a 
point P(xo, yo) and a line Z given by the equation ax + by = c in the 
xy-plane. We will denote this distance as D(P,L). To start, we need to find 
a vector >? perpendicular to the line L. Note that this also implies that >7 
would be orthogonal to any vector parallel to the line L. This orthogonal 
vector is also sometimes referred to as a normal vector to the line L. The 
vector 7f = (a, b) is such a vector, since if we take any two points Q(x, y) 
and R(s, £) on this line, then the vector R= (x — s, y — f) is parallel to the 


line L: 


R -RÒ = a(x — s) + b(y —2) 
= ar + by — (as + bt) 


=c-—c=0 


318 


Then, we need to find a vector that moves from the point P (xo, yo) to the 
line L. We take the vector 7 = PO = (x — x0, y — yo), where Q(x, y) is an 
arbitrary point on L. Now, thinking of the vector 77, which is orthogonal 
to the line Z also starting at the point P(xo, vo), we see that the vector 
projection pro hat (W) of am onto Ere is the vector starting at the point 


P(xo, yo) and ending at the point on the line Z closest to P(xo, yo). The 
length of the vector projection |projst (ŒW) will be the shortest distance 


from the point P(xo, yo) to our line L. This situation is depicted in Figure 
6.15. 


When we now put all of the information presented above together, we find 
that the shortest distance from the point P(xo, yo) to our line LZ with 
equation ax + by = c is given by |proj- (ŒW) for the vectors Wn = (a, b) 
and yf = (x — xo, y — yo) where Q(x, y) is any point on the line L. Putting 
this into a formula, and with some algebraic manipulations, we get 


ww -t|  |a(x— zo) + b(y — yo)| 
«ody, _ | _ = 0 Y — Yo 
FO} Ww Jj = (r hn] = —- = — 
|projz (w) Eig aie a 
_ lax + by — azo — byo| _ |c — azo — byo| 
Va? +b? va? +b? 
= lar + byo — c| 


So the shortest distance D(P, L) from the point P(xo, yo) to the line L: ax + 
by = c is given by 


jazo + byo — c| 
LP, L) = ——s 
(6.20) ee Va? +6? 


Example 6.4.4. Now, let us draw the picture to illustrate the derivation of equation 
(6.20), and then do a shortest-distance calculation from the point P(—7, 13) to the line L 
given by 20x—32y = 165; refer to Figure 6.15 for this example. Note the formula given in 
the definition of the variable LPerpP in the code below. This variable defines the 
equation of the line perpendicular to L. To understand the formula given, remember that 
any line perpendicular to L must be of the form 32x + 20y = C, for arbitrary C € R since 


slopes of perpendicular lines in R are negative reciprocals of one another. In our case, 


the point P must satisfy the equation 32x + 20y = C; hence C = 32(—7) + 20(13). This is 
how LPerpP is defined. 


319 


Figure 6.15: Using projections to find the distance from a point to 
a line. 


L = 20 x — 32 y == 165; 
LPerpP = 32 x + 20 y == 32 (—7) + 20x13 
32x+20y == 36 


SolnPt = Solve[{L, LPerpP}, {x, y}][[1]] 


fx 113 E 
356 Y 89 


V = {20, —32}; 


P = {-7, 13}; Q = {14, 5/8x14 — 165/32}; 
ProjectPQtoV = (Q — P).V/Norm[V]? V 


(me) 


N[Abs[(V.P — V.Q)/Norm[V]]] 


19.1065 


320 


PtsPlots4 = Graphics[{PointSize[0.03], Blue, Point[{P, Q}]}}; 


LinePlot4 = ContourPlot[{Evaluate[L], {x, —12, 20}, {y, —20, 20}, 
ContourStyle— {{Black, Thickness[0.005]}, {Blue, Thickness[0.01]}}, 
PlotRange— All, Frame— False]; 

ArrowPlots4 = Graphics{{Arrowheads|.10], Thickness[.010], Blue, 
Arrow[{P, P + V}], Black, Arrow[{P, Q}], Red, Arrow[{P, P + 
ProjectPQtoV}]}]; 

TxtPlots4 = Graphics[{Black, Text{"V", {8, —9}], Text["PQ", {5, 
9}], Text["ProjectPQtoV", {—7, 5}] , Text["L", {—10, —9}], Text["P", 
{—9, 13}], Text["Q", {16, 3}]}); 

RightAnglePlot4 = Graphics[{Black, Polygon[{ {x, y} /. SolnPt, 
({x, y} + {1, —1.6}) /. SolnPt, ({x, y} + {2.6, —0.6}) /. SolnPt, 
({x, y} + {1.6, 1}) /. SolnPt}}}); 

Show([LinePlot4, ArrowPlots4, TxtPlots4, PtsPlots4, RightAngle- 
Plot4] 


P 


Now we will take things one step further. What if we desire to know the 
distance from a point to a plane? Referring back to Section 2.1, we 
determined that the equation for a plane was given by ax+by+cz = d. 
From our discussion of the distance from a point to a line, we noticed that 
the vector representing the orthogonal direction to the line was given by 
the coefficients in front of the x and y variables. A similar argument holds 
true for the three-dimensional case that gives the vector zf = (a, b, c) as 
orthogonal to the plane ax + by + cz = d, which we will call R. 


Given a point on the plane, call it (x1,y1, z1), it must satisfy axı + by, + 
cz| = d. Plugging this in for d and moving everything to the LHS of the 
equation of the plane R gives 
a(x — £1) + b(y — y1) + e(z — zı) = 0. Now we relate this to the 


321 


dot product that we just recently introduced; if we define the two vectors 
yt = (a, b, c) and wm = (x1, y — y1, Z — Z1), notice that 


alz — 2) + Wy —y.)+e(z—-23) = n- W=0 

(6.21) ( 1) + Dy — yr) + of 1) 

Geometrically speaking, this states that given a point P1 (x1, y1, 21), and 
direction 57, any other point P(x,y,z) that lies on the plane must satisfy the 
equation 


x. (P-P) =0 


Applying this process to planes of arbitrary dimension yields the 
following definition 


Definition 6.4.1. Given a plane in n-dimensional space defined by the 
equation @,2, + @g%2 +*+ + nTn = d, the normal vector Fp is 
independent of the value of d and is given by z = (ay ee ass On ) 


How does the definition of the normal vector help us in our case? Well, 
similar to the solution to the distance of a point from a line, the shortest 
distance from a point to a plane is found by drawing a line segment 
perpendicular to the line passing through the point. The direction of this 
perpendicular line in three dimensions is #7 = (a, b, c). Now, once we 
have this, we notice that the previous two-dimensional approach also 
describes our situation if we were looking at the plane directly from the 
side. The entirety of the previous argument still holds for the 
three-dimensional case. So the shortest distance, D(P, R), from the point 
P(x, yo, z0) to the plane R: ax + by + cz = d, is given by 


|a£o + byo + czo — d| 
D(P, R) = —— 
(6.22) ( ) V a + b? + c? 


Imagine now how this could be modified for higher-dimensional 
problems. 


Example 6.4.5. Now, we will do an example to illustrate this formula for a point and a 
plane. The plane is given by 2x — 3y + 6z = 1, and we wish to find the distance from the 


322 


plane to the point P(1, 1, 7). Furthermore, we will choose the point Q(1, 1, p on the 
plane for which to perform our computations. 


Before we begin the computations, we should discuss how to set up the equation of a line 
Lin R , given a point P and a direction re For this example, the point is P(1, 1, 7), 


and the direction will be = (2, -3, 6), the normal vector to the plane 
. If we choose any other point (x, y, z} on the L, then the 


following equation must hold: 


(x,y,z) — (1,1, 7) = t (2, —3, 6) 


for some real value ¢. This is true since a line always points in the same direction; hence 
the difference between two points on the line, yielding a displacement vector, must also 
point in the direction of the line, and must therefore be a scalar multiple of this direction. 
Looking at this in terms of components yields the set of three equations x — 1 = 2t, y—1= 
-3t,z—7= 61, orx =2t+ 1, y=-3t+ 1 and z= 6t + 7, which is the parametric equation 
for the line perpendicular to the plane 2x — 3y + 6z = 1, which passes through the point 
P(l, 1, 7). Solving each of these for t gives 


=4 = —7 
t= 4> t= pe E 
(6.23) 74 


The rectangular equations of this line are 


z-l y-l y-1 =2z-7 
5 «<3t 3 © 
If we wish to know where the line L intersects the plane 2x — 3y + 6z = 1, we need a 
system of three equations to solve for the three unknown coordinates x, y, and z. Setting 
the first two, and last two, equations equal in system (6.23), and adding the condition that 


the point must lie in the plane 2x — 3y + 6z = 1, gives the following system of three 
equations 


(6.24) 
4. 91. PR 12% 


5} =e Sg eg a a ae 


Solving system (6.24) for x, y and z gives the point 


OB 8&7 
E a aE 


323 


This point lies both on the line Z and the plane 2x — 3y + 6z = 1, and must be the closest 
point on the plane to the point P, since L has direction normal to the plane. Mathematica 
will verify this below with the help of projections: 


Plane = 2x—-3y+62==1 


LPerpThruPlane = {=> == aS 


SolnPt = Solve[Append[LPerpThruPlane, Plane], {x, y, z}] 


{{x>-5.y3 273 Zh} 


Plane /. SolnPt 


{True} 


The calculation above verifies that the point found in the prior Solve command actually 
does lie on the plane. We can now calculate the distance from the point to the plane. 


V = {2, —3, 6}; P = {1, 1, 7}; Q = {1, 1, 1/3}; 
ProjPQtoV = ((Q — P).V)/Norm[V]? V 


E 


P + ProjPQtoV 
{31,169,108 

49’ 49° 49 
N[Abs[(V.P — 1)/Norm[V]]] 


5.71429 


This is the actual value of the distance from the point to the plane. We now perform some 
Mathematica commands to plot the resulting vectors and the plane to illustrate these 
calculations (see Fig. 6.16): 


Figure 6.16: Using projections to find the distance from a point to 
a plane. 


324 


PtsPlots} = Graphics3D[{Black, Sphere[P, 0.5], Black, Sphere[Q, 
0.5], Black, Sphere[P + ProjPQtoV, 0.5]}]; 

PlanePlot5 = ContourPlot3D[ Evaluate[Plane], {x, —10, 10}, {y, 
—10, 10}, {z, —10, 10}, Mesh-—+None, ContourStyle— LightBlue, 
AspectRatio— 1/2]; 

ArrowPlots5 = Graphics3D[{Arrowheads[.05], Thickness[.010], Red, 
Arrow[{P, P + V}], Black, Arrow[{P, Q}], Blue, Arrow[{P, P + 
ProjPQtoV}]}); 

TxtPlots5 = Graphics3D[{Black, Text["V", {2, —2, 10}], Text["PQ", 
{2, 0, 3}] , Text["ProjPQtoV", {—2, 6, 3}], Text["P", {1, 0.5, 7}, 
Text("Q", {1, —1/2, 1/3}]}]; 

Show[PtsPlots5, ArrowPlots5, PlanePlot5, TxtPlots5, Axes—+True, 
AxesLabel-+{x, y, z}, ViewPoint—-{—7, —5, 5}] 


10 


ox 


o> 
Yy 


-10 


Homework Problems 


1. 


9 


2; 


For 7, u? € R”, under what conditions does proj (W) = 0 


For 7]. af € R” relate the sign of comp (7% ) to the angle 0 


between vectors q and 7. 


3. Compute comp- (wW ) for the following pairs of vectors: 


325 


(a) X = (-1,2), X = (3,5) (b) Y = (4,6), W = (2,3) 

(c) # = (3,0), X = (5,1) (d) @ = (—1,0,2), X = (3, 2, -2) 

(e) 7 = (1,1,-1), W = (2,-1,2) (f) W = (3,0,0), W = (-1,1,1) 

(g) Y = (3,-2,—4), W = (-1,2,0) (h) Y =(1,1,-1,1), W = (1,-1,1,1) 


4. Compute proj-> (w) for the following pairs of vectors: 


(a) Y = (-1,2), W = (3,5) (b) Y = (4,6), w = (2,3) 
(c) T = (3,0), X = (5,1) (d) ¥ = (-1,0,2), @ = (3,2, —2) 
(e) ¥ = (1,1,-1), W = (2,-1,2) (f) Y = (3,0,0), W = (-1,1,1) 


(g) Y = (3,-2,—4), W = (-1,2,0) (h) X = (1,1,-1,1), W = (1,-1,1,1) 


5. Compute the normal vectors to each of the following planes: 


(a) 3z- 5y =7 (b) 2r +8y = 2 


(c) r- 5y+7z=3 (d) 2w — 4r + 5y- Tz =2 


6. Find the distance from the point P(2, 3) to the line 3x — 4y = 6 

7. Find the distance from the point P(1, —2, 4) to the plane 2x + 5y — 
6z=1. 

8. Find the distance from the point P(1, —2, 4, —1) to the plane w + 
3x- 2y + 2z=-1. 

9. For each of problems 6-8, find the point that lies on the line or 
plane at the location that minimizes distance between the line or 
plane and P. 

10. Can the ideas of this section be generalized to vector projection 
of complex vectors in Ç "? Tf yes, then explain how and what still 
works using the complex dot product. If no, then explain why not, 
specifically stating what fails. 

11. Find a formula for the shortest distance between the two parallel 
planes ax + by + cz = d and ax + by + cz = e. Test your formula on 
an example of two planes. 

12. Find a formula for the shortest distance between the line x = at 
ta, y = bt+f,z=ct + y, parallel to the plane ex + fy + gz = h. 


326 


Test your formula on an example of a parallel line and plane. Also, 
how can we determine whether a line and plane in R are parallel? 


13. Find a formula for the angle between two nonparallel planes in 
space. Also, find a formula for the angle between a line and plane in 
space that are nonparallel. 


Mathematica Problems 


1. Given the vector %? = (1, 0) and the vector function mo) = 
(cos(A), sin(t)) perform the following: 
(a) Plot comp-+ (w (t)) as a function of ¢ for 0 < t < 27. 
(b) Plot proj + (w(t) as a function of ¢ for 0 < t < 2m along 
with both @ and RO. 
(c) Find the values of ¢ that maximize and minimize 
[compy (w (t))|- 
2. Repeat Mathematica problem 1 for the three-dimensional vectors 
W = (0, 1, 1) and WO = (cos(t), 1+2sin(t), —sin(7)). 
3. Graph the point, line, and displacement vector with base at P and 
tip at the point on the line closest to P from homework problem 6. 


4. Graph the point, plane, and displacement vector with base at P 
and tip at the point on the line closest to P from homework problem 
7. 


5. Let gf = (4, 7) and 7 = (—5, 0). Animate the plot of the arrows 
for proj- (cw ), where the animation parameter c goes from —2 to 
2. Be sure to include the arrows for both zł and 7 in each frame of 


the animation. 


327 


Chapter 7 


A Few Advanced Vector 
Algebra Topics 


7.1 Rotations in Space 


Next, we investigate a topic in computer graphics: that of rotating 
parametric surfaces and curves in space about any line of rotation not 
passing through them. For simplicity, we will only look at the case when 
the line passes through the origin. The procedure that we develop can be 
generalized to an arbitrary line of rotation, and you are asked to do so in 
homework problem 5. 


In order to rotate about a line in space, we need to understand a few 
important concepts, the first of which is how to define a unique line in 
space. Since the line passes through the origin, it is determined uniquely if 
we know another point P on the line or equivalently any vector T= OP 
parallel to the line. Here T refers to the origin. The concept of rotating a 


single point Q through an angle a about a line L generalizes easily to 
rotating a parametric curve or surface about the line, so let us begin with 
just a single point. As the point Q is rotated (or revolved) about the line Z 
(Q is not on L), the point Q will travel along an arc of the circle C 
orthogonal to L where the center of the circle C is the nearest point of L to 


o. 


To rotate the single point Q through an angle a about the line L, we will 
use vector operations and position vectors to find the circle of revolution 
C in which the point Q rotates about the line L. The circular arc along 
which Q travels should be perpendicular to the line L, with a radius equal 


328 


to the shortest (perpendicular) distance from Q to L. The circle’s center C 
corresponds to the point on the line L nearest to Q. The circle itself will be 
given parametrically in terms of the parameter a that is now the variable 
angle of rotation. Rotating Q along the arc of the circle through an angle 
of a will result in a new point N. For ease of computation, we will convert 
P and Q to vectors, leaving them capitalized to remember that they 
correspond to points. (Consider the fact that a point is really the terminal 
point of a vector starting at the origin.) 

First, we must find the vector or which is the center of the circle of 
rotation. This vector is found by computing the projection of the vector G 
(our point) onto the vector B (parallel to our line through the origin). 
Using the vector projection formula from Section 6.4 gives 


7.) C= oP 
The radius R, of the circle of rotation, is just the length of the vector 


Q — G.andsoR = G-a 


Next, consider the following vector: 
(7.2) T Q ” G 
R 


which is of unit length, starting at C and traveling in the direction of Q. 
The vector iti is in the plane of the circle of rotation. 


Another unit vector in the plane of the circle of rotation, perpendicular to 
both T and A. is given by 


= Tap 


7.3) V 
a C xU] 


where y is simply a scalar multiple of C x U and must therefore be 


perpendicular to both Yai and T by definition of the cross product. The 


329 


vector Vv is made unit length simply by dividing the cross product by its 
magnitude. 


So what has this accomplished? Consider the following idea: In the plane 
of the circle of rotation, U and Vv are acting as the positive x- and y-axis 
unit vectors, where our point Q is always positioned at the coordinates 
(1,0) in the (U, V )-coordinate system. Now the circle of rotation has the 


vector parametric equation given by 


(7.4) Üa (a) = C 4+ Reos(a)U + Rsin(a)V 
Example 7.1.1. For our first example, consider the following situation: We wish to rotate 


the point Q(3,5,9), through the angle a = F about the line L, which passes through the 


origin and the point P(—2,1,4). On examining equation (7.4), we see that 


Grot(0) = C +RU 


which by equation (7.2) is Q. Then, by setting a = T we arrive at the point N as follows: 
ci T\ => i T 
N = Bin (=) = È + Rcos (=) U + Resin (3) V 


P = {—2, 1, 4}; Q = {3, 5, 9}; 
Cntr = P:Q/P.P P 


p am s a 
3° 3" 3 
Rad = Norm[Q — Cntr] 


170 


a 


U = (Q — Cntr)/Rad 


{ 19 10 7 } 
v510° V 51° /510 


330 


V = Cross[Cntr, U]/Norm[Cross[Cntr, U]] 


{- ll 3 10 a 13 \ 
v1190 V 119’ 4/1190 


CircleRotation = Cntr + Rad Cos[A] U + Rad Sin[A] V 
{ _ 10 Pi 19Cos[A] _ 11 Sin{A] 


3 3 v21 
5  10Cos[A] ys . 20 7Cos[A] 13Sin[A] 
er a er a E. 


SpaceCircle = ParametricPlot3D[CircleRotation, {A, 0, 27}, Plot- 
Style {Blue}]; 


NewPt = CircleRotation /. {A—7/3} 


fa 11 10,15 47 B | 
6 277° 3 VT 6 27 


Figure 7.1: Rotating the point Q through the angle a about the line 
L gives the new point N. 


PtsPlot1 = Graphics3D[{Black,Sphere[Q, 0.5],Black,Sphere[NewPt, 
0.5], Black, Sphere[Cntr, 0.5]}]; 

TxtPlots1 = Graphics3D[{Black, Text("N", {—2.5,10,5}], Text["Q", 
{3.5, 5.5, 10}] , Text["C", {—3.5, 2.5, 7.1}], Text["L", {—4,—3,4}]}]; 
LinePlot1 = Graphics3D[{Thickness[.005], Red, Line[{{0, 0, 0}, 1.5 
Rad PHH; 

Show[SpaceCircle, PtsPlot1, LinePlot1, TxtPlots1] 


Now that we have the resources required to rotate a point about a line through the origin, 
as in Figure 7.1, we can determine what else is required to rotate more complicated 


331 


objects about the line. A point is simply a zero-dimensional object in R` a curve will be 
a one-dimensional object in R , and can be parameterized by a single variable. These 


one-dimensional curves in space are referred to as spacecurves. Spacecurves are given in 
vector form by (x(£), y(t), z(£)} for t € [a, b]. On examining all the work done to revolve a 


single point Q about a line, it should become apparent that replacing Q(x,y,z) with 


(x(t), vO), z()) for t € [a, b], changes nothing in the formulas. In other words, the formula 
holds for each fixed ¢ € [a,b] and we can therefore consider the formulas to hold along 
the entire spacecurve for all ¢ e [a,b] So, when we revolve a spacecurve about a line L, 
we are really just revolving each of the points of the spacecurve about L to produce 
another spacecurve. 


Example 7.1.2. As an example, we will rotate a helix about the line L, which passes 
through the origin and the point P(—2,1,4), which is the same line used in Example 7.1.1. 
We define our circular helix vectorially by 


(7.5) 


(a(t), y(t), z(t)) = (6+ = sin(t),2 + = cos(t),5 + ; 


for t e [-27, 37]. The central axis for this helix is parallel to the z-axis and passes through 


the point (6,2,5), where each point of the helix is at a fixed distance of 2 from this 


central axis (see Fig. 7.2): 
Helix = {6 + 2/3 Sin[t], 2 + 2/3 Cos|t], 5 + t/4}; 
QBot = Helix /. {t——27} 
8 T 
{8 5-3} 
QTop = Helix /. {t-+37} 


Tn 


Figure 7.2: Graph of the helix spacecurve that we will rotate. 


332 


LinePlot2 = Graphics3D[{Thickness[.005], Red, Line[{—1.5 Rad P, 
2.5 Rad P}]}}]; 

HelixOriginal = ParametricPlot3D[Helix, {t, —27, 37}, PlotStyle 
—{Thickness[.008], Blue}; 


Show|[HelixOriginal] 
as s 
y 4 ~_ss 

15 < 
f 
6 
z | 
5! 
| 
| 
4 


We can also think of this helix as the collection of the tips of the position vectors given in 
equation (7.5). The following code plots, as an animation, the position vector field for 
this helix as arrows from the origin to the spacecurve. Figure 7.3 shows a single frame 


from the animation, corresponding to t = — 2 T. 


Figure 7.3: Graph of the helix spacecurve along with the position 


vector starting at the origin terminating at the helix for t = — 2 T. 


333 


OriginSphere = Graphics3D[{Black, Sphere[{0, 0, 0}, 0.2]}); 
Manipulate[HelixExp = Evaluate[Helix /. {t—>tval}]; 
ArrowPlotExp = Graphics3D[{Arrowheads[.05], Thickness[.005], 
Red, Arrow[{{0, 0, 0}, HelixExp}]}); 

Show[OriginSphere, ArrowPlotExp, HelixOriginal, PlotRange-> 
{{-1, 10}, {—1, 5}, {—1, 10}}, Axes— True], 

{{tval, —27, "t"}, —27, 37, 1/6}] 


CntrTop = N[P.QTop/P.P P] 
{—1.78649, 0.893243, 3.57297} 

CntrBot = N[P.QBot/P.P P] 
{—0.417474, 0.208737, 0.834949} 

RadTop = N{[Norm[QTop — CntrTop]] 
8.66809 

RadBot = N{[Norm[QBot — CntrBot]] 
7.34544 


334 


UTop = (QTop — CntrTop) /RadTop 
{0.898293, 0.0507713, 0.436454} 

UBot = (QBot — CntrBot) /RadBot 
{0.873667, 0.33462, 0.353179} 


VTop = Cross[CntrTop, UTop]/Norm({Cross[CntrTop, UTop]] 
{0.0509252, 0.974578, —0.218182} 
VBot = Cross[CntrBot, UBot] /Norm[Cross[CntrBot, UBot]] 
{—0.21501, 0.916739, —0.33669} 


Figure 7.4: The helix is rotated about the line L for the frame 0 = 
Sx here 


335 


CircRotTop 
VTop; 
CircRotBot 
VBot; 


SpaceCirclePlots = ParametricPlot3D[{CircRotTop, CircRotBot}, 
{A, 0, 27r}, PlotStyle—+{{Thickness[.008], Blue}, {Thickness[.008], 

Red}}); 

CntrHelix = P.Helix/P.P P; 

RadHelix = Norm[Helix — CntrHelix]; 

UHelix = (Helix — CntrHelix) /RadHelix; 

VHelix = Cross[CntrHelix, UHelix]/Norm[Cross[CntrHelix, UHe- 

lix]]; 

HelixRot = N[CntrHelix + RadHelix Cos[A] UHelix + RadHelix 

Sin[A] VHelix]; 

Manipulate[HelixRotExp = Evaluate[HelixRot /.{A—Aval}]; 
HelixRotExpPlot = ParametricPlot3D[HelixRotExp, {t, —27, 37}, 
PlotStyle—{Thickness[.008], Black}]; 

Show[LinePlot2, SpaceCirclePlots, HelixRotExpPlot, PlotRange— 
{{—10, 10}, {—10, 10}, {—10, 10}}, Axes— True, ViewPoint—+{—3, 
6, 3}), 

{{Aval, 0, "@"}, 0, 2m, r/12}] 


CntrTop + RadTop Cos[A] UTop + RadTop Sin[A] 


CntrBot + RadBot Cos[A] UBot + RadBot Sin[A] 


Oy 


10 ~i E S 
0 ab 


In Figure 7.4, note that we have included two circles of revolution. These circles 
correspond to the start of the helix, found when ¢ =—2z, and the end of the helix, at t = 
32. Clearly, the helix is rotated perpendicularly about the line L, bounded by these two 
circles of revolution. 

The only object left to rotate about a line in R is a two-dimensional surface, which 
must be parameterized by two variables. 


336 


Example 7.1.3. We will take the torus (or donut shape) to be our surface T with center 


at the point (4,7,9), inner radius 1, and outer radius 2, which is plotted in Figure 7.5. 
Vectorially, the equation defining T is given by 


T(u, v) = (4+ (2 + cos(u)) cos(v), 7 + (2 + cos(u)) sin(v), 9 + sin(u)) 


for u € [-1, n], v e [~r, z]. Note that T is parallel to the xy-plane, has three 


components, and is defined by two independent variables u and v. Therefore, T isa 
two-dimensional surface in R- 


Figure 7.5: The torus as defined vectorially by T given above 


Torus = {4+(2+Cos[u])Cos[v], 7+(2+Cos[u])Sin[v], 9+Sin[u] }; 
ParametricPlot3D[Torus, {u, —7, m}, {v, —7, 7}, Mesh—None] 


Figure 7.6: Rotation of the torus about the line L. The frame 


shown here corresponds to 6 = 5 


337 


CntrTorus = P.Torus/P.P P; 

RadTorus = Norm[CntrTorus — Torus]; 

UTorus = (Torus — CntrTorus) /RadTorus; 

VTorus = Cross[CntrTorus, UTorus]/Norm[Cross{CntrTorus, 


UTorus]]; 

TorusRot = CntrTorus + RadTorus Cos[A] UTorus + RadTorus 
Sin[A] VTorus; 

LinePlot3 = Graphics3D[{Thickness[.005], Black, Line[{ —1 Rad P, 
3 Rad P})}); 


Manipulate[TorusRotExp = Evaluate[TorusRot /.{A—Aval}]; 
TorusRotExpPlot = ParametricPlot3D[TorusRotExp , {u, —7, 7}, 
{v, —7, n}, Mesh— None]; 

Show[LinePlot3, TorusRotExpPlot, PlotRange-+{{—15, 15}, 
{—15, 15}, {—8, 22}}, Axes—+True, ViewPoint {—3, 6, 3}, 
{{Aval, 0, "O"}, 0, 27, 7/12}] 


10 


10 3 


- | 
\ i = 10 | 
| 


Homework Problems 


1. Instead of circular rotations about a line L (Fig. 7.6), construct a 
method for rotating a point Q about a line in an elliptical path. You 
may assume that the line is at the center of the ellipse and the point 
Q lies at one of the vertices at an endpoint of the major or minor 
axis of the ellipse. 

2. Generalize the method that you produced in problem 1 replacing 
circular rotations about a line L to general rotations along any 


338 


planar closed parametric curve, such as the ellipse of problem 1, 
where the line L goes through the interior of the planar closed 
parametric curve. Can this be generalized further so that the rotation 
curve need not be planar? 


3. Express the rotation function given by 
Grala) =C+ Rcos(a)U + Rsin(a)V 


in terms of matrices and matrix multiplication. The points and 
vectors should be treated as column matrices. 


4. Consider the following situation. If the vector B is parallel to the 
axis of rotation (a line through the origin) and is defined by B =(0, 
a, b) and the position vector G to be rotated is given by G =(c,0, 
0), then the method breaks down since the center Yai is the origin or 
zero vector and so the unit vector y cannot be obtained by 


V = Jers bees Yai x =O: How can you circumvent this 
problem, which occurs whenever the vectors B and G are 
perpendicular? 

5. Generalize the method of this section to rotating about an 


arbitrary line L, not necessarily going through the origin, but given 
as passing through two points P and S 


6. Use the method from problem 5 to find the coordinates of the 
point N that is calculated by rotating the point Q(1,9,—5) about the 
line Z through the two points P(7,—2,4) and S(—10,3,—5) through the 
angle a = $n. 


Mathematica Problems 
1. Rotate the point Q(1,1,1) about the line that passes through the 
origin and the point P(—1,2,—2). 
2. Determine all angles a from problem 1 such that Qrot(a) lies in 
the xy-plane, yz-plane, or xz-plane. 


3. Using your answers to homework problem 1, rotate the point Q(|, 
1,1) about the line that passes through the origin and the point 


339 


P(3,4,3) in an elliptical path. Set the minor axis radius to the value 
R= IQ ko C| and major to 3R. 

4. Rotate a sphere of radius 1, with center located at (1,2,1), about 
the line that passes through the origin and the point P(—6,7,3). 

5. A Möbius surface M is given parametrically by 


M= G + cos(u) + vsin (5) cos(u),5 + sin(u) + vsin (5) sin(u), 4v cos (3) 


for -n < u < m and — $ <v< S Rotate M about the line that passes 
through the origin and the point P(3,4,—7). 
6. Using homework problem 5, animate the rotation of the helix 


(x(t), y(t), z(t)} = (9 + 2sin(t), 2 + 2cos(t), t), t € [-2r, 27] 


about the line Z through the two points P(7,—2,4) and S(—10,3,—5) 

7. Using homework problem 5, animate the rotation of the Möbius 
surface M, of problem 5, about the line Z through the two points 
P(7,-2,4) and S(-10,3,—5). 

8. If you did either of homework problems 1 and 2, test to see if 
your ideas worked by computing and plotting an example using 
Mathematica. 


7.2 “Rolling” a Circle along 
a Curve 


We now combine calculus, vectors, and parametric curves in an attempt to 
mathematically roll a circle along a parametric curve. The circles will 
have a constant radius R in our first example, while for the second 
example the circles will have a variable radius. Our circles will be tangent 
to the base curve along which they are rolling in both versions, since this 
will make the rolling appear closest to the actual physical process. In order 
to visualize this process, a vector format for our curves and their tangents 
is necessary. 


340 


If we have a parametric curve in the plane defined by x = f(t) and y = g(t) 
with ¢ in the interval [a, b] as our base curve, its vector format is ¥( = 


AÀ, g(t). This is called the position vector field of the parametric curve. 
Its derivative vector field or tangent vector field FO is 


(7.6) T(t j= 2 -(¢ 7) 


dt dt’ dt 


A vector field perpendicular to TO is 


an Bit) = (-2, 5) 


dt dt 
Since 
_ [df dg dg df\ _ 
To- Bo =(F, a): (-2,5)=0 


Also, BO can be made length R, corresponding to the radius of our 


circles, by first making it a unit vector field and then multiplying by R. 
Then the center vector field Yai (£) for our circles is given by 


(7.8) C(t) = X(t) + | o 


If we now write CO = (H(t), K(£)}), then our rolling circles are given by 


the parametric curve: 


(z(s), y(s)) = (Reos(s) + H(t), Rsin(s) + K(t)) 


(7.9) 
= R (cos(s), sin(s)) + C(t) 


As s varies in the interval [0, 27], we move around this circle while we get 
one circle for each fixed value ¢ € [a,b]. In this situation, s is the variable 
that moves us around each circle while ¢ is the rolling variable that moves 
us from one circle to the next circle because it controls the location of the 
circles’ centers. 


341 


The formula for alo) in (7.8) may appear complicated; however, if we 


take a minute to examine each piece, we should begin to see how simple it 
really is. Since val) is the center of the circle that we wish to roll along 


the curve, we need to start on the curve, XO and move R units 
perpendicular to FO. Since BO is orthogonal to PO, which in turn is 
tangent to FO we have that BO is orthogonal to YO. To proceed R 
units in the BO direction, we must first make BO unit length by 
dividing it by its magnitude, then multiplying it by R. This gives the 
second piece of the sum found in (7.8). 


Example 7.2.1. Let us look at an example to see this rolling in action. As the base 
parametric curve, let us use the parametric version of y = 3sin(x) given by x = t and y = 
3sin(t) for t € [0, 2x]. The plot, given in Figure 7.7, will contain both the base parametric 
curve and seven circles spaced 1 radian apart, with each circle having a radius of 0.3. 


Bv = {t, 3 Sin[t]}; 


BPlot = ParametricPlot[Bv, {t,—1,7}, PlotStyle— {Thickness[0.005], 
Black}; 


TanBv = D[Bv, t] 


{1, 3Cos{t]}} 

PerpP = {—TanBv/|[2]], TanBv{[[1]]} 
{—3 Cos|t}, 1} 

UnitPerpU = PerpP/Norm[PerpP] 


[- 3 Cos|t} l } 
/1 +9 Abs[Cos[t]]? \/1 + 9 Abs[Cos[t]]? 


TanBv.PerpP 
0 


342 


R = 0.3; 
CntrV = Bv+ R aaa 


ft- _  09Cost] +3Sinft] } 


V1 + 9Abs[Cos|t]]? y1+9 AT ig 
CntrVals = Table[CntrV, {t, 0, 6}] 


{{—0.284605, 0.0948683}, {0.74468, 2.68193}, {2.23415, 2.91544}, 
{3.28432, 0.51909}, (4.26725, —2.13412}, {4.80557, —2.6483}, 
(5.71659, —0.739859} } 


Figure 7.7: Graph of the circles, centers of circles, and parametric 
curve. 


CntrPlots = ListPlot[{CntrVals}, PlotMarkers—+{"+", Medium}, 
PlotStyle— Red]; 

CirclePlots = Table[ParametricPlot(CntrVals((k]]+R {Cos[s],Sin[s]}, 
{s, 0, 27}, PlotStyle—Blue], {k, 1, 7}; 
Show[BPlot,CntrPlots,CirclePlots, PlotRange— {{—1,7},{—4,4}}] 


y 


Next, we use the Manipulate command to create an animation of these circles “rolling” 
along the curve (see Fig. 7.8). 


Figure 7.8: The circle “rolls” along the curve using the 
Manipulate command. Here, t = $ T. 


343 


Manipulate{CenterVal = Evaluate[CntrV /. {t-—+tval}}; 
CenterValPlot = ListPlot[{CenterVal}, PlotMarkers-+{"+", 
Medium}, PlotStyle—Red]; 

CircleValPlot = ParametricPlot{CenterVal + R. {Cos{[s], Sin{s]}, 
{s, 0, 27}, PlotStyle— Blue]; 

Show[BPlot, CenterValPlot, CircleValPlot, PlotRange-{{—1, 7}, 
{—4, 4} }, 

{{tval, 0, "t"}, 0, 27, 7/16}] 


Next, we modify the calculations given above with a nonconstant radius. In particular, we 


are going to choose the radius to be l, where x is the curvature of the function. 


K 
Curvature, loosely defined, is the amount that a function changes direction per unit 


length. For a parametric spacecurve 7 (t) = (x(t), y(t), 2(t))-the formula 


for curvature is given by 
IP’) x P”) 
Pe 


We omit the derivation, which can be found in most multivariate calculus books. The 
formula given in equation (7.10) simplifies somewhat in two dimensions. If 


P(t) = (x(t), y(t) 
|z’ (t)y" (t) — z” (t)y'(t)] 
((æ(t))2 + (y'(t))2)? 


(7.10) x(t) = 


(7.11) &(t) = 


344 


This formula can be computed quite readily from the three-dimensional version by setting 


1 


2(t) = 0. For a circle of radius r, it can be shown that the curvature at any given point is $. 
T 


Thus, if we fix a point on our graph and compute its curvature x and draw a circle with 


1 


radius Á touching the function at that point, in essence, we have created a circle of best 


K 
fit for the function at that specified point. This also implies that the circle should always 
lie on the concave-in side of the function and should have the same tangent line at the 
point. A circle that satisfies these criteria is sometimes referred to as an osculating circle. 


To see how this works, we will start with a curve in the plane given parametrically by 


P ( t) = ( z( t) y( t)) We have the formula for curvature x(t) already given. 
? 
We now need to find the center of the circle that has radius (ty To do this, we need to 


find the vector orthogonal to the tangent vector that points toward the concave-in side of 
the curve. So, first we compute the unit tangent vector, defined as 


(7.12) 
j= P'(t) _ z'(t) y'(t) 
[7 "(e)l VEHAR +U? VEDE + &O)? 


With a little bit of work, we can compute the derivative of the unit tangent vector: 


(7.13) 


£7) = (poe = 2'(ty'(Qy"(®) y"@(e'()? - =e ew 
dt OPVO > COVO 


It may not be immediately obvious, but the unit tangent vector and its derivative are 
orthogonal. We could actually compute the dot product between the expressions given in 


(7.12) and (7.13). or we could make the following observation: Since (£) is a unit 


vector, Po = | for all ¢, the derivative of the magnitude of TO must be zero: 
d 
7.14, “Fi e)| =0 
zl 
However, we know that (Z(t) g — T (t) A T(t) and thus 


d 2 
(7.15) = ITE] =0 


Now, we introduce an important identity involving the derivative of the dot product. If 


g t) and Po are two vector-valued functions, then 
4 


345 


(7.16) £ (Sw) T(t) = S(t). T(t) + S(t): T(t) 


Applying this formula to | 7 (t) | 2 gives us the following: 


d ,- 2 d 

OI =< (Fo- Tw) 
= T(t) T(t) + T(t) . T(t) 
=2T'(t)- F(t) 


Using equation (7.14) and the last line of the set of equalities above gives 


T'(t)- T(t) =0 


and we can therefore conclude that To and TO are orthogonal vectors. We now 


ll 


define the unit normal vector No as 


Finally, to locate the center of the circle, we start at the base point P (t) and move a 


; | an te tags 
distance of R(t) in the direction NO: therefore 


1 


r 
ey N (t) 


(7.18) C(t) = P(t) + 


Example 7.2.2. Now, we will do an example with a parametric curve in the xy-plane. The 
curve we will use is in the class of logarithmic spirals. In particular, we choose 


P (t) = (2e~°* cos(t), 2e~ sin(t)) 


We will have Mathematica perform all the computations. Pay special attention to how we 
compute higher-order derivatives. We plot the curvature in Figure 7.9, while Figures 7.10 
and 7.11 correspond to the process of rolling a circle, with radius inversely proportional 
to curvature, along the logarithmic spiral 


346 


X = 2 e7®?t Cosjt]; 
Y = 2 e7®?t Sinft]; 
DX = D[X, t] 


—0.4e~°?* Cos[t]-2e~°?* Sinft] 
DY = DIY, t] 

2e7°?* Cos|t]—0.4 e~°-?* Sin{t] 
D2X = D[X, {t, 2} 
—1.92e~°?* Cos{t]+0.8 e~° * Sin{t] 
D2Y = D[Y, {t, 2} 

—0.8e~°?* Cos|t]—1.92 e~°?* Sin|t} 


Figure 7.9: Graph of curvature for the parametric function 
representing the logarithmic spiral. 


k = Abs[DX D2Y — D2X DY] / ((Dx? + Dy?)” "i 
Plot|«, {t, 0, 57} 


K 


347 


Figure 7.10: The osculating circle to the logarithmic spiral at t = 


Nim 


T. 


Position V = {X, Y}; 

TangentV = {DX, DY} / Sqrt [Dx? $ Dy’); 

NormalV = D[TangentV, t]/Norm[D[TangentV, t]]; 

EndPointV = PositionV + ł NormalV; 

SpiralPlot = ParametricPlot[PositionV, {t, 0, 57}, PlotRange— 


{{—2, 2}, {—2, 2}}, PlotStyle-+{Black, Thickness[0.01] }]; 


CntrRadPlots = Graphics[{PointSize[0.03], Blue, Point|{EndPointV 
/. t—r/2, PositionV /. t—+7/2}]}]; 

LinePlot = Graphics|{Thickness[.005], Blue, Line[{EndPointV /. t 
7/2, PositionV /. t—r/2}}; 


CirclexPlot= ContourPlot |Evaluate| ((x — EndPointV[[1]])?_ + (y 


— EndPointV[[2]])? == ry) te t+x/2], {x, —2, 2}, {y, —2, 2}, 


K 
ContourStyle—{Red, Thickness[0.007]}, PlotRange— All, Axes— 
True, Frame-+False |; 


Show[SpiralPlot, CirclexPlot, CntrRadPlots, LinePlot] 


y 


7 


The above calculations work well for a parametrically defined function in the plane. This 
method also works for standard functions of one variable of the form y = fx), since they 


can be parameterized in the form Po = (t, RÀ). Now, the following question remains: 
How do we modify this formula for functions in R of the form Po = (x(t), v0), z(t)? 


The first thing that we must do is determine what plane the osculating circle lies in. 
Fortunately, it lies in the osculating plane, which is formed by the unit tangent and unit 


normal vectors TO and No respectively. The work to be done still is to compute the 


equation of the circle of radius L with center in the osculating plane. We leave it to the 
K 


348 


interested reader to modify the code for the osculating circle in the plane to make an 
animated plot of an osculating circle in R ; 


We end this example with a Manipulate command which animates the osculating circle 
as a function of ¢ for the logarithmic spiral. 


Figure 7.11: The osculating circle to a logarithmic spiral, here t = 
Tr 


re 


Manipulate{[EndPoint Val = Evaluate[EndPointV /. {t—tval}]; 
CntrRadValPlot = Graphics[{{PointSize[0.02], Blue, Point[ 
{EndPoint Val, Evaluate[PositionV /. t—tval]}]}]; 


LineValPlot = Graphics[{Thickness[.005], Blue, Line[{EndPoint Val, 
Evaluate[PositionV /. sedi 


Circlex ValPlot= ContourPlot Evaluate| ((x — EndPointVal{[1}])? 


+ (y — EndPointVal|{[2]])?== GY) /.t—rtvall {x, —2, 2}, 
{y,—2,2}, ContourStyle-+{Red, Thickness[0.007]}, PlotRange-All, 
Axes— True, Frame— False, PlotPoints—50, MaxRecursion—2); 
Show[SpiralPlot, Circlex ValPlot, CntrRadValPlot, LineValPlot, 
PlotRange—{{—2, 2}, {—2, 2}}], 

{{tval, 0, "t"}, 0, 57, m/4}] 


Homework Problems 


1. Verify that a circle of radius r, defined by the position vector 
field @(t) = (r cos(t), r sin(¢)), has curvature K = i. 


349 


2. The the position vector field #(4) = (r cos(at), r sin(at)), with a > 
0 still defines a circle of radius of r. One revolution takes 2r time 
units, instead of 2m. Compute the curvature of (4) and compare 
your answer to problem 1. 
3. Compute the curvature of the helix defined by P(A = (cos(d), 
sin(A), f. 
4. Compute the unit tangent vector field TO unit normal vector 
field NO, and the cross product TO x NO, of the helix F(t) = 
(cos(f) sin(f, 4). 
5. Explain why TO x NO is always a unit vector field. 
6. (a) Find the formula for the curvature of a standard curve y = f(x) 
by converting it to parametric form. 
(b) Use this curvature formula to find the curvature function 
k(x) for y = sin(x). 
(c) Explain why the only functions of one variable with 
constant zero curvature (i.e., K(x) = 0 for all x) are linear 
functions y = mx + b. 
7. This section could really be described as “Sliding a ball along a 
curve,” since we are not really rotating the circle along the curve. A 
circle is symmetric about its center; therefore, it is easy to imagine 
that the circle really is rolling along the curve. Devise a method for 
actually rolling the circle along the curve. Do the same for rolling 
an ellipse along a curve. (Hint: Arclength may come into play here.) 


8. Find (look up the pseudosphere) a parametric surface whose 
curvature is a fixed negative real constant, and verify that this is 
true. This requires a new definition of curvature where curvature is 
allowed to be negative, and to ensure that it will also apply to 
surfaces. 


Mathematica Problems 


1. Graph the curvature of an ellipse given by the position vector 
field (4) = (2 cos(Z), 3 sin(t)) for t € [0, 27]. 

2. Graph the curvature of an ellipse given by the position vector 
field F(d) = (2 cos(2A), 3 sin(22)) for t € [0, z]. 


350 


3. Plot the unit tangent and unit normal vector fields to the ellipse # 
(t) = (2 cos(f), 3 sin(t)) for t e [0, 27]. 

4. Roll a circle of radius i in a counterclockwise fashion around the 
interior of the ellipse F (4) = (2 cos(Z), 3 sin} for t € [0, 27]. 

5. Roll a circle of radius 1 in a clockwise fashion around the interior 
of the ellipse Ž (A = (2 cos(Z), 3 sin(d)) for t € [0, 27]. 

6. Roll a circle of radius Ł in a counterclockwise fashion around the 
interior of the ellipse P= (2 cos(f), 3 sin(A)) for t € [0, 27]. 

7. Roll a circle of radius 2 in a clockwise fashion around the interior 
of the ellipse F*(4) = (2 cos(Z), 3 sin(d)) for t € [0, 27]. 

8. Roll a circle of radius 1 inside of the helix 7 (£) = (cos(¢), sin(Z)). 


9. Use the method devised from homework problem 7 to actually 
rotate the circle in problem 4. 


10. Plot the original function y = sin(x) together with its curvature 
function from homework problem 6 (b). 


7.3 The TNB Frame 


Since we introduced the tangent and normal vector fields, PO and NO, 


respectively, to the position vector field P(A) = (£), VO), z(©)), in Section 
7.2, we might as well introduce one last vector field. It was shown that the 
tangent and normal vector fields were orthogonal; therefore, by the 
definition of the cross product, we can find a third vector field that is 
orthogonal to both TO and NO. This is how we define the binormal 


vector field Bo: 
(7.19) B(t) = F(t) x M(t) 


The binormal vector field BO: together with the tangent and normal 
vector fields TO: make up what is known as the TNB frame. 


351 


As a side note, we could have defined Bo as the cross product of NO 
with TO This still would have yielded a third orthogonal vector; 
however, we wish for the TNB frame to satisfy the right-hand rule, which 
stipulates that if we start with our right hand outstretched in the direction 
of the tangent vector, curl our fingers in the direction of the normal vector, 
and then stick our thumb out, we get the direction of the binormal vector. 
This rule is commonly used in physics. 


Note that since TO and NO are unit vector fields, the binormal vector 
field is also a unit vector. Remember that we have the formula 


|B (t)| =| T(t) x N(t)| = POIN E] sino) 


and by construction, IFO = ROI = | and sin(@) = 1. Therefore we can 
conclude that BO! = |, and thus is of unit length. Now, we have three 


orthogonal unit length vectors fields in R associated with any spacecurve 
P ©. The set 


(7.20) {T®, Nt), Bi} 


forms an orthonormal basis for R for any appropriate value of ¢ in the 
domain of (7). This means that any vector in R can be expressed as a 
linear combination of the unit vectors in the TNB frame. We will explore 
the concept of orthonormal bases in Chapter 8. 


The TNB frame is used, for example, in spacecraft navigation, where 
having an orthogonal set of linearly independent vectors with all three 
vectors having physical interpretations, comes in very handy. The TNB 
frame is sometimes referred to as the Frenet—Serret frame, and is related 
to the Frenet-Serret formulas found in differential geometry. 


Example 7.3.1. This section is quite short, as very little new information was introduced. 
However, to use Mathematica to illustrate this concept requires a reasonable amount of 
coding. As an example, we will compute both the curvature (Fig. 7.12) and the TNB 
frame (Fig. 7.13) for the space curve given parametrically by 


F (t) = (cos(3t), sin(2t), cos(2t)) 


352 


Pay special attention to the procedure at the end of this section of Mathematica code. 
Using the Manipulate command (Fig. 7.14) is an exceptionally flexible way to animate 
the TNB frame along the given curve: 


Figure 7.12: Graph of curvature for the spacecurve F (2). 


X = Cos[3 t]; Y = Sin[2 t]; Z = Cos[2 t]; 

DX = DX, t]; DY = D[Y, t]; DZ = D[Z, t]; 

D2X = D[DX, t]; D2Y = D[DY, t]; D2Z = D[DZ, t); 

PositionV = {X, Y, Z}; 

DPositionV = D[PositionV, t]; 

TangentV = DPositionV / Sqrt[DPositionV|[1]]?-+DPosition V|[2]]?+ 
DPositionV [[3]]?}; 

DTangentV = D[TangentV, t]; 

NormalV = DTangentV / Sqrt{DTangentV/[[1]]?+DTangent V [[2]]?+ 
DTangent V[[3]]?]; 

BiNormalV = Cross[TangentV, NormalV]; 

k = Sqrt{((DY D2Z — D2Y DZ)? + (DX D2Z — D2X DZ)? + (DX 
D2Y — D2X DY)?]/((DX? + DY? + DZ?)?/2); 

Plot[x, {t, 0, 27}] 


0. —___———————._? 
0 n 2a 


N[Evaluate[(NormalV.TangentV) /. t—2]] 
—1.38778 x107?" 
N[Evaluate[(NormalV.BiNormalV) /. t—2]] 


-1.11022 x1071® 
N[Evaluate[(TangentV.BiNormalV) /. t—2]] 


0. 


Figure 7.13: TNB frame for the spacecurve at time t = 2. 


353 


SpaceCurve = ParametricPlot3D[PositionV, {t, 0, 27}, PlotStyle— 
{Thickness[0.008], Black}); 


TNBPlots = Graphics3D[{Arrowheads[.07], Thickness[.015], Red, 
Arrow[{N[PositionV /. t-+2], N[(PositionV + TangentV) /. t—+2]}}, 
Yellow, Arrow[{N[PositionV /. t—+2], N[(PositionV + NormalV) /. 
t-+2]}], Blue, Arrow[{N[PositionV /. t-+2], N[(PositionV + BiNor- 
malV) /. t—+2]}]}); 


Show[SpaceCurve, TNBPlots, PlotRange—{{—1.5,1.5}, {—1.5,1.5}, 
{—1.5, 1.5}}] 


Figure 7.14: TNB frame at time t = A using the Manipulate 


command. 


354 


Manipulate[PositionVal = N[PositionV /. {t—tval}]; 

Tangent Val = N[TangentV /. {t—+tval}]; 

NormalVal = N[{NormalV /. {t—tval}}]; 

BiNormalVal = N[BiNormalV /. {t-tval}]; 

ArrowPlots = Graphics3D[{Arrowheads[.07], Thickness[.015], Red, 
Arrow[{PositionVal, PositionVal + Tangent Val}], Yellow, Arrow[{ 
Position Val, Position Val+NormalVal}], Blue, Arrow[{Position Val, 
PositionVal + BiNormalVal}]}]; 

Show[SpaceCurve, ArrowPlots, PlotRange—{{—2, 2}, {—2, 2}, {—2, 
2}}, Axes True, ViewPoint—{—3, 6, 3}], 

{{tval, 0, "t"}, 0, 2m, m/12}] 


Mathematica Problems 


1. Plot the TNB frame for the helix given by P (£) = (cos(t), sin(t, £)) 
with 0 < t 4r. 


2. Plot the TNB frame for the spacecurve given by P(A = (t, t 
cos(4f), t sin(44)) with 0 < t 4r. 
3. Plot the TNB frame for the spacecurve 


T (t) = ((2 + cos(2t)) cos(3t), (2 + cos(2t)) sin(3t), sin(4t)) 


with 0 < t 4n. This spacecurve is known as a figure-eight knot 


355 


4. The followingspace curve lies on a torus and is given by 
TP (t) = ((3 + 2cos(t)) cos(10t), (3 + 2cos(t)) sin(10t), 2sin(t)) 


with 0 <¢ 2z. Plot, in the same figure, the following: P (ô, the TNB 
frame for 7 (H, and the torus on which 7(¢) lies, which is given 
parametrically by 


F (u,v) = ((34+2cos(v)) cos(u), (3+2cos(v)) sin(u),2sin(w)), for =r < u,v < 7 
5. The spacecurve 


7 (t) = ((3 + 2cos(3t)) cos(7t), (3 + 2 cos(3t)) sin(7t), 2sin(3t)) 


for 0 < ¢ 27, also lies on a torus defined in the previous problem. 
Plot, in the same figure, X (^, the TNB frame for #(), and the 
torus. 

6. Plot the curvature function, x(t), for each of the space curves 
given in problems 1-5. 


7. Look up the Frenet-Serret equations and the torsion function T(f). 


8. Verify the Frenet-Serret equations for the spacecurves in 
problems 1-5. 


j oe avy 

9. For a parametric surface with position vector field #(u, v), ry 

and 2 are two tangent vector fields for this surface. Then a? x 
ou Ou 

ar is a normal vector field for this surface. How can we use this 


information to find the equation of the tangent plane to this surface 
at the point P( F (uo, vo)) on this surface? 


10. Find the equation of the tangent plane to the torus defined in 
problem 4 at the point P (? (§, +)) Plot both the torus and the 


tangent plane that you found. 


356 


Chapter 8 


Independence, Basis, and 
Dimension for Subspaces of 


R” 


8.1 Subspaces of r” 


If you recall, in Section 6.1 we introduced the definition of R” as a vector 


space for the scalars chosen from R. For completeness, we redefine it 
here. 


Definition 8.1.1. The vector space R™ over the scalar field IR is the set of 


vectors Y represented by arrows starting at the base point of the origin 
(0, 0, ..., 0) in the point set R™ and stopping at the point P = (p1, p2, ..., 
pn) in the point set R™. We write the vector Y as v = (p1, P2, «++, Pn) 
for the n real components p1, p2, ..., Pn: 


R” = { (Pis Pas- -+> Pn) (P1: P2» ---,Pn) € R"} 


The vector space R” allows scalar multiplication to be performed only 
when the scalar is a real number, which is what is meant when we say that 


R" is a vector space over the field of scalars IR. The way we do arithmetic 
(addition and scalar multiplication) in the vector space R™ is as if its 
elements are real row or column matrices in R! *” or R” * £, Also, the 
context and difference in notation will hopefully be made clear when we 


357 


are using a vector T from R” versus a point P from R", as both sets are 
called R", but are very different things mathematically since only in the 
vector space version of R™ can we do arithmetic. 


The vector space R™ over the scalar field IR is said to be the real 
Euclidean vector space of dimension n or for short, real Euclidean 
n-space. The two real Euclidean n-spaces that we are most familiar with 
are 


R? = {(pi,p2) | pi € R} 


which is the vector space of two-dimensional vectors and is visualized as 
the xy-plane, and 


R? = {(p1, p2, P3) | pi € R} 


which is the vector space of three-dimensional vectors and is visualized as 
xyz-space. For completeness, we will say that R° is only the real number 
zero and it is the real zero-dimensional vector space. 


In Definition 8.1.1, we defined the real Euclidean vector space R”, but it 
is not always possible to restrict ourselves to merely using real numbers, 
sometimes out of necessity we must use complex numbers instead, such as 
when we find the roots of real coefficient polynomials; for instance x +1 
= 0 has purely complex roots x = + i. This will also hold in Chapter 12, 
when we study eigenvalues and eigenvectors. As such, we need to define 
the complex Euclidean vector space C™ of dimension n, which is merely 
the complex version of the real Euclidean vector space R™ of dimension n. 


Definition 8.1.2. The vector space C” over the scalar field C (short for 
Y = 


complex Euclidean n-space) is the set of vectors written as 


(91,92) -+++4n) for the n complex components q1, 42, ..., qn with 
C" = {(a1,925.--9n) (41,42;--+54n) € C" } 


The vector space C™ over the field of scalars C allows scalar 
multiplication to be done only when the scalar is a complex number. This 
is what we mean by saying that C” is a vector space over the scalar field 
C. Furthermore, R C C, and so a real number is also a scalar for the 


358 


vector space C” over the scalar field C. Addition and scalar multiplication 
are done in the vector space C” such that the elements of C” are complex 
row or column matrices in C! * ” or C” * 1, Remember that if we are 
doing arithmetic in C”, then C™ is the vector space and not the point set. 


These two examples of the idea of vector space lead us to the following 
general definition of a vector space V over a field of scalars F. A field F is 
a set with at least two elements OF and IF in which we can do all of the 
operations of arithmetic in the standard manner, such as in the field of 
rational numbers Q the field of real numbers R, and the field of complex 
numbers C. The integers, Z, are not a field since we cannot do division in 
Z and always stay in Z. Note also that QcRc Cc which makes Q a 
subfield of IR and in turn R is a subfield of C. 


For simplicity, a field F in this text will be either Q R, or C. There are an 
infinite number of other fields F, but we will not need or study them here. 


Definition 8.1.3. A vector space V over the scalar field F is a nonempty 
set on which there are two operations of scalar multiplication (using the 
scalars from F) and vector addition. The vector space V must satisfy all of 
the following properties: 


Scalar multiplication | For any Zz E V and scalar a E F, at EV 


Vector addition For all T, y EV, T + y EV 


Additive Axioms 


Associativity For all P, ?, y EV, 


B+(2+P)=(8+7)+7 
Commutativity For all Pye vV, -+ V-V-P 


> — 
Existence of vector There exists O © V such that O + T = T 
identity 


Existence of additive For each T E V, there exists a i such that T + 
inverse 


wv 


is expressed as — T) 


(The additive inverse 


359 


Scalar Multiplication Axioms 


Distributivity over vector For any scalar a Ẹ F and all T, v EV, a(x + 
addition Y) ee i 


Distributivity over scalar For any scalars a, b € F, and all T EV, (a+b) 
addition P-a? 


Distributivity over scalar For any scalars a, b € F, and all T EV, (aby 
multiplication =a(b 7) 


Existence of scalar identity | There exists a scalar 1 © F such that for T EV, 


\f-2 


On inspection of the closure properties and axioms in the definition of a 
vector space V over the scalar field F, it should be apparent that R™ is a 


vector space over the field R, but it is also a vector space over the field Q 
while it is not a vector space over the field C. Moreover, C™ is a vector 
space over the field C, but it is also a vector space over both of the fields 


Qand R. 


Example 8.1.1. Let V= Qes be the set of all rational 2 x 3 matrices. Then W is a 


vector space over the field of scalars Q where vector addition and scalar multiplication 
are the usual matrix addition and matrix multiplication by a scalar. Here V is not a vector 


space over either field R or C since in each of these cases the closure property of 
scalar multiplication would fail. 


Example 8.1.2. Let V= { ax’ t+bxte |a, b,c E R } be the set of all polynomials of 


degree < 2 with real coefficients. Then W is a vector space over the field of scalars . 
where vector addition and scalar multiplication are the usual polynomial addition and 


multiplication by a scalar. Here Y is also a vector space over the field Q but it is not a 


vector space over the field C since in this case the closure property of scalar 
multiplication would fail. 


Example 8.1.3. Let V- ff: [0, 1] > R | fis a continuous function }. A function f is 
called continuous if its graph can be drawn without raising your pen from the paper, that 
is, its graph has no holes, jumps, or breaks of any kind. Examples of such functions’ rules 


are sin(x), cos(x) and e*, while £ since it has a jump at x = 0 due to its vertical 


asymptote here. Then V is a vector space over the field of scalars R where the vector 


360 


addition and scalar multiplication are the usual function addition and multiplication by a 


scalar of the functions’ rules. Now W is also a vector space over the field Q but it is not 


a vector space over the field C, since in this case the closure property of scalar 
multiplication would fail. 


Example 8.1.4. Let V- {a+ b cos(x) + c cos(2x) | a, b, c = Q, Then ¥ is a vector 


space over the field of scalars Q where the vector addition and scalar multiplication are 
the usual function addition and multiplication by a scalar. Here V is not a vector space 


over either the field R or C, since in each of these cases the closure property of scalar 
multiplication would fail. 


Note that in all of these examples of a vector space V over a scalar field F, 
the field F is intimately incorporated into the way in which the elements 
of V are defined, this is not a coincidence, but necessary to have some 
hope that V might be a vector space over the field F. As well, these 
examples point out that if V is a vector space over the scalar field F, then 
V is also a vector space over any subfield of F, but not over any superfield 


of F. Both Q and R are subfields of C, while R isa superfield of Q and 
C is a superfield of both Q and R. 


In the remainder of this section and throughout the rest of this text, we 
will restrict ourselves mainly to the vector space R™ over the field of 
scalars IR and occasionally to the vector space C” over the field of scalars 
C. The reason for this restriction is to simplify our discussion to its basics, 
but in fact all that we will do in R" will also be true in any general 
finite-dimensional vector space V. This is the case since every 
finite-dimensional vector space V over the scalar field F of dimension n is 
essentially the same vector space as F™ over the scalar field F, which is 
very much like R” and C”. 


We are now interested in the case when a nonempty subset of the vector 
space R™ over the field of scalars IR is a vector space over the field of 


scalars IR in its own right. When this is true, we will say that S is a 
subspace of R" . In order to define more clearly what a subspace § of R” 
looks like, we need first to discuss linear combinations of elements of R”. 


; —> —> + 
Definition 8.1.4. For any collection of k elements V1; ¥2,--++UVk of R”, 
and any real scalars a1, a2, ..., ak, we say that 


361 


k 
> > > 
X aiT = Q)V] + A2U2 +’ + akUk 


. . ess =p >? 
is a (k element) linear combination of the vectors V1; V2» - -+ , Uk. 


Definition 8.1.5. A subspace § of the vector space R™ over the field of 
scalars IR is any nonempty subset of R", where any two-element linear 
combination of elements from S is also in S, that is, S satisfies the closure 
properties of a general vector space. In other words, § is a subspace of R” 


if for any two scalars a, b © R and any two vectors X, v ES, we have 
a+b ES 


Then § is itself a vector space over the field R, since it satisfies the 
closure properties of a vector space and it inherits the addition and scalar 
multiplication axioms from R". 


— 
Note that the zero vector, 0 , must be in every subspace 8. This forces any 


subspace to be nonempty since we can take the linear combination 0 V+ 


T-T Also, if T ES, then- Y ES since C1) Y +0 Y=- Y 
Y of Y v 


and moreover, all scalar multiples a f are inSsincea Y +0 


Y= a v ES. Thus, ina subspace S, we can do basic vector arithmetic 
of addition and scalar multiplication in § and stay in S. This is an intuitive 
result that we use when we perform addition, subtraction, and 
multiplication of integers. By adding, subtracting, or multiplying any two 
integers, you always end up with another integer. 


Example 8.1.5. Now, we get to the promised plot, depicted in Figure 8.1, of the 
two-element linear combinations of vectors, which we choose to be 


v nn (5, -7, M1) na _ (—3, 15, 9) i, IR. tn order to see that they 
Pad Y 


form the plane through the origin parallel to both and , we will take the vectors 


v.40 


that are linear combinations as s 
well. 

u = {5,—7, 11}; v= £3, 15, 9}; 
uvPlane = ParametricPlot3D[s u + t v, {s, —2, 2}, {t, -2, 2}, Mesh—None, 
PlotStyle— {Blue, Opacity[0.5]}]; 


for real parameters s and ¢, and plot them as 


362 


uvPlots = Graphics3D [{Arrowheads[.05], Thickness[.010], Black, Arrow[{{0, 0, 0}, 
u}], Red, Arrow[{{0, 0, 0}, v}]}15 


uvTxtPlots = Graphics3D[{Black, Text[“u”, {6, —7, 12}], Text[“v”, {-3, 15, 11}]}]; 
Show[uvPlane, uvPlots, uvTxtPlots, Viewpoint— {40, —20, —40}] 


Figure 8.1: Vector space formed by two nonparallel vectors uv 


and 7. 


40; 
j 


| 10 
40 -10 


-40 


=o 


Example 8.1.6. We next consider the following subset of R 
S = {(a,0,b,0) |a,bE R} 


Can we show the closure properties hold? Let us check vector addition first: 


(a;,0, b1, 0) + (a2, 0, b2, 0) = (a; + a2,0, by T bo, 0) 


Since the sum of two real numbers is another real number, the resulting vector sum is still 


of the form (a, 0, b, 0) Next, we check scalar multiplication: 
a (a,0,6,0) = (aa,0,ab, 0) 


Since the product of two real numbers is a real number, the resulting vector is also in S. 


Therefore 5 is a vector subspace of R, and closely resembles R: 


363 


Example 8.1.7. The set: S= {(a, b, c,0, 0) l a, b, ceR } , is a vector subspace 


of P and looks like a copy of R 3 inside of IR >. To show this, a similar argument 
can be used as in Example 8.1.6. 


Example 8.1.8. The previous two example were relatively straightforward, let us now 


attempt to discern why the following set is a subspace of Re. 
S = {(5a — 3b + 8c,a + 2b — c, —a + 7b — 4c, 0, 4a + b + 3c, 2a + 6b — 9c) | a,b,c E R} 


On inspection, notice that we can actually express 5 as 


(8.2) 5 = {fat +b +cU |a,b,c € R} 


where 


@ = (5,1,-1,0,4,2), Y = (—3,2,7,0,1,6), W = (8, —1, —4,0,3, —9) 


. This automatically means that § is a subspace of Rs, since on expressing § in the 
form of equation (8.2), 5 satisfies the closure properties of a vector space. 


Example 8.1.9. So far we have seen examples of what subspaces of R" can be, the 


following set is not a subspace of BRS: 
S = {(a,b,c,7,d) |a,b,c,d E R} 
To see this, add any two vectors from 5 together: 


(a1, 61,¢1, 7, dı) + (a2, b2, €2, 7, d2) = (a) + a2, bı + bo, ¢1 + C2, 14, d, + dz) 


Notice that the fourth component is not 7, which implies that the vector addition given 


above does not result in another vector in #, and hence does not satisfy the closure 
properties of a vector subspace. We could have also used the fact that scalar 


multiplication also does not result in another vector of 5, for if a # 1, then is also not a 


vector of i. 


a (a;,61,¢1,7,d1) = (aa1,a@b,,ac¢,7a,ad;) 


Example 8.1.10. Now let us consider the following subset of R 5, 


S = {(0,0,0,0,0)} = {0} 


364 


Does the zero vector alone constitute a subspace of R 59 Well, if we check the closure 


= = = = =b 
properties, notice that 0 + 0 = 0 and a 0 = 0 for any scalar a. One can conclude 
that the zero vector of R™ forms a subspace of R” always, and is also the smallest 
subspace of R”. 


—> =} c 
Theorem 8.1.1. Let (Ui; U2, +++» Uk} be any k element subset of the vector 
space R™. Then the subset & of R” consisting of all the linear 
combinations of these k elements is a subspace of R”. 


Proof, The subset § is defined as 
S= {a} + agud +--+ + apuk | @1,@2,...,a, E R} 


Hopefully, it is clear that if we form a two-element linear combination of 
any two elements of S, then we get another element of S, and so § is a 
subspace of R". 


In R”, the intersection UNV and sum U + V of two subspaces U and Y 
are also subspaces. The sum U + ¥ is the smallest subspace of R", which 
contains both the subspaces U and V. 


Definition 8.1.6. The sum U + V of two subspaces of R" is defined by 
6.3 U+V={V +7 |X €U and Y €V} 


Example 8.1.11. Let us define U and W to be the following two subsets of IR’ 
—> =b —> —> =b 
U= {aui + bug | a,b, € R}, V= {adj +bv2 +cU |a,b,c € R} 


> > -> þa —_ 
where %1 U2 U1, U2, U3 are five vectors of R, with the added property that v3 
E U as well. Then 


U+V= {aR +b +cv +d |a,b,c,d E€ R} 


+ 
i R’ not U3 j ition of U 
is a subspace of . Notice that we have excluded the vector “3 in the definition of 

+ V, as it can be expressed as a linear combination of vectors in U. 


Example 8.1.12. Now we do an example involving subspaces from R 8 Let 


365 


U = {(a,b,0,0,c,d,0,0) | a,b,c,d E R}, V = {(0,0,a,b,0,c,0,0) | a,b,c E R} 


which are clearly both vector subspaces of R: Then 
U +V = {(a,b,c,d,e, f,0,0) | a,b,c,d,e, f € R} 


is a subspace of IR 8 and looks like a copy of IR € inside R 3: 


The union of two subspaces U and V of R", denoted UUV, is not 
necessarily a subspace of R™. We can see this since two different planes U 


and V, through the origin in R: , are different subspaces of R? , but if we 


add an element P of U to an element of q of V, then P + q is not in 
either U or V and so is not in their union. As an example, consider U = 
{(0,b) |b E€ R} and v = {(a,0) |a € R}, Clearly U is a subspace of R? 
and corresponds to the y-axis and similarly, V corresponds to the x-axis. 
By taking (9; 1) € U and (1,9) € V, note that (; 1) + (1,0) = (1, 1} is not in 
either subspace U or V. 


You should picture a subspace § of R” as any subset of R” that contains 
the origin and is flat or noncurving, that is, like a line or plane in R. 
However, you must keep in mind that a line or plane in R? that does not 
pass through the origin is not a subspace of R’, since the origin is in 
every subspace. 

Now we return to systems of linear equations. A system of linear 
equations written as the matrix equation AT = T , where the matrix of 
coefficients A is a k x n matrix, requires that the column of variables T 
be ann x 1 matrix, and the zero column 0 be ak x 1 matrix. The matrix 


—> 
equation (or linear system that it represents) Ag = 0 is said to be 
homogeneous because of the zero column vector on the RHS, and 


—_> 
nonhomogeneous if we have the system At = ? with ? #0 . This 
linear system has n variables making up the column matrix T. We will 
think of the solution column matrix T as a vector in R™, and the zero 


, = : 
column matrix 0 as the zero vector in Rk. 


366 


— 
The set of solution vectors Z in R” to the matrix equation Ag = 0 isa 
subspace S of R”. In order to see this, we do a little algebra. Let Ej and ? 
> 
be two solutions to AX = 0 anda and b be two real numbers. Then 
—> 
0 


(8.4) A(ay +b?) =a(AY) +b(AP) =a0 +60 = 


and so the linear combination av + be is also a solution, thus the 
solutions form a subspace of R". 


On the other hand, the set of solution vectors T in R” to the matrix 
> 
equation Ag = ? is not a subspace of R™ if ? #0 .Ifwe repeat our 


calculation above for two solutions, Vv and 7, to Ag = ?, and let a and 
b be two real numbers, then 


(8.5) 


Ala +o?) =al AJ) + b(AZ) =a? +b? = (a+b)? 
which is not necessarily rg and so the linear combination ov + be is 
not a solution. 


On comparing equations (8.4) and (8.5), we can see that the relationship 
. i E lb ge 

between solutions to AX = 0 and AF = 7? is quite intimate. If Ej and 

are any two solutions to At = for C#0 , then their difference is 


—> 
a y — ? solution to AZ = Ü , since 
A(¥ -?)=APV-AV=7-?@=0 
= 
This says that given any solution Ej and a fixed particular solution YÍ to A 


P= C tor C40 , there is a solution Z wise =O so that v-7 T 
Yf In other words, if T is the set of solutions to A T = ? then 


(8.6) 
T= {W +y} |V satisfies AZ = 0 and p} satisfies A? = ? } 


—b 
Thus T = § + YÍ, where S is the subspace of R” consisting of the 
> —p 
solutions to Aa = 0 and YF is any fixed particular solution to At = rca 


367 


— 
Here T is said to be a translate of the subspace § by the vector YF; that is, 
—b 
T is parallel to S and shifted so that T goes through YÍ instead of the 


origin. 


Example 8.1.13. Let us now have Mathematica do an example of this for us. Let our 
nonhomogeneous system be 


llz — 8y + 3z = —5 
-7r +y- 6z =2 


Its matrix of coefficients A and the column matrix č are given respectively by 


a 11 -8 3 l 2=|7; 


-7 1 -6 
A= {{11, -8, 3}, {-7, 1, —6}}; c = {{-5}, {2}}; RowReduce[Join[A, c, 2]] // 
MatrixForm 
E 
1 0 1 -H 
13 
011 8 


From this rref matrix, we see that the solution to this nonhomogeneous linear system is 
one-dimensional, and given by 


> = 13 
Se See oe Soe 
This solution can be written using column vectors as 
T -z- H -1 -} 
f=|y |=] -z+ E =z] -1 |+ = 
z z 1 0 
Pay ae | 13 =| aa 13 T 
The column vector Yf = ( 45° 45) 0) = [ 45 45 0 ] is a particular 


fixed solution to this nonhomogeneous system, corresponding to z = 0. 


RowReduce[Join[A, {{0}, {0}}, 2]] // MatrixForm 


101 0 
0110 


368 


From this new rref matrix, we see that the solution to this homogeneous linear system is x 
= -z and y = —z. This solution can be written using column vectors as 


z -z 
X= y |= | -z 
z z 


Note that if we add to all of these homogeneous solutions the particular solution Yf , we 
get all the nonhomogeneous solutions as expected. The solutions to the homogeneous 
linear system form a line in space through the origin parallel to the vector 


( -1 a 1, 1) The solutions to the nonhomogeneous linear system form a line in space 
(~ 4,38.) (-1,-1,1) 
through the point 45° 45° ~ / parallel to the vector ? 3 */. In Figure 8.2, 


we plot both the general solution subspace S, to the homogeneous system, given by 
S={2| 72 =(-t,-t,t), tE R} 


along with the general solution subspace translate T, of the nonhomogeneous system, 
given by 


3 11 13 
T=S+y7 =4 7 |? =(-t,-t,t)+(-—,—,0), tER 
45 45 
In the following Mathematica code, we have changed the variable z to the parameter f to 
set these solutions up parametrically: 


s= (+t, +t, t}; p = (11/45, 13/45, 0}; 
T=S+P 


stat 


ParametricPlot3D[{S, T}, {t, —3, 3}, Plot Style {{Red, Thickness [0.005]}, {Blue, 
Thickness[0.005]}}] 


Figure 8.2: The subspace §, and subspace translate T. 


369 


n 
© 


Example 8.1.14. Now for one more example of a slightly larger system of linear 
equations. Let our nonhomogeneous system be 


-5 
2 


llz — 8y + 3z + 2w- u 
—7z + y — 6z + 12w + 4u 


Its matrix of coefficients, A, and right side column vector, z are given by 
aul 2-8 3 2 -1 >_[-5 
-7 1 -6 12 4 |’ i 2 


A= {{11, -8, 3, 2,-1}, {-7, 1, —6, 12, 4}}; c€ = {{-5}, {2}}; RowReduce[Join[A, C, 2]] 
// MatrixForm 


-98 3 <0 

1 0 1 45 45 4 
l4 _37 13 

0 1l 1 45 45 45 


From this rref matrix, we see that the solution to this nonhomogeneous linear system is 
given by 


ae Le me at er LU G 
F PT Aan T a Se iS ie E 


370 


Notice that the solution is three-dimensional, since the solution can be expressed in terms 
of the three independent variables, u, w, and z. In column vector notation, we can express 
the solution as a linear combination of three vectors, multiplied by the arbitrary scalars u, 
w, and z: 


T -z + w+ u- H 
y -z + Hw + Hu + 13 
7 = z |= z 
w w 
u u 
=A 98 31 =k 
45 45 45 
=al 146 37 13 
45 45 45 
= g 1 | +w 0 +u] 0 + 0 
0 1 0 0 
0 0 1 0 


The column vector 


11 13 11 13 z 
—> Are E = ie oer 
a 45° 45°? 45 45 MMN 


is a particular fixed solution to this nonhomogeneous system corresponding to the choices 


<= 
< 
| 


z=0, w=0, and u = 0 in the solution. Notice that the solution xX is a three-element 


linear combination in R 5 of the three vectors 


98 146 31 37 
(—1,-1,1,0,0), ( 


Ae? ae 0) V t ’ —,—,0,0,1 
45 45 mane 45 45 


with the fixed vector Yf added to every linear combination. The general solution to the 


homogeneous linear system PE = T should now be 


371 


45 

X= z l | +w 0 +uj 0 
0 1 0 

0 0 1 


Let us check this by using rref on its augmented matrix. Note that the solution X =[-1 
-110 o]"is obtained by setting z = 1, w = 0, and u = 0 in the solution to the 
PF ok 16 ] 
homogeneous linear system, the solution x tIS B 010 J 
obtained by setting z = 0, w = 1, and u = 0 in the solution to the homogeneous linear 
3a 37 T 
A=|x x O O 
system, and the solution 45 45 
=0, w=0, and u = 1 in the homogeneous linear system. 


T 
is 


T 
] is obtained by setting z 


RowReduce[Join[A, {{0}, {0}}, 2]] // MatrixForm 


1 01 -4 -4 0 


45 45 
146 37 
01 1 -7 -z 0 


One last point to make before we end this section. When considering 
solutions to linear systems geometrically as subspaces, or subspace 
translates, of R™, we must now accept the following fact: A linear system 
of equations has either no solution, exactly one solution, or infinitely 
many solutions. The reason for this is that all solutions to a given linear 
system form either a subspace, or a subspace translate, of R™. Using the 
properties of subspaces, we can argue that only nonhomogeneous linear 
systems can have no solution, in which case there is no fixed particular 


solution YÍ. If there is a single solution, then in the homogeneous case, 
this would correspond to the subspace given only by the zero-vector, 
which is the only subspace of R™ that contains only a single vector. A 
single solution to a nonhomogeneous system corresponds to a subspace 
translate of the zero vector solution from the homogeneous case. If there is 


a nonzero vector T in the subspace corresponding to the solution to the 
homogeneous system, then there are an infinite number of solutions, since 


v 


all scalar multiples of must also be in the subspace, and must 


372 


therefore be solutions to the homogeneous system. The same can be said 
about the subspace translate for the nonhomogeneous system. If we now 
look at this last case algebraically, in terms of matrix equations, we can 
show through sheer force of algebra that this must be true. 


Let At = ? be our matrix equation for a linear system, and let us assume 

that the system has two different solutions, y and ?, where y Fa Z. 
> 

Then, v7 #0.1fr€ER then, for P= Vir(2_-V), we have 


Aw = Ay +rA(7 - 7) 
=C+r(Ay - AÈ) 
=] +r(?-?) 
=? +r(0- 0) 
=? 


—> 
Since Y- 2 # 0, V= F +r(T - P) are solutions to AZ = e as r 


varies over the real numbers; hence we have generated a method of 
constructing an infinite number of solutions from a single pair of 


solutions. This tells us that given any two different solutions v and z? to 
AT = ?, the line T = Ej + rv = 2) through v parallel to Ej = E4 are 


also solutions to AÈ = ?. 


Homework Problems 


1. Determine if each of the following sets defines a vector space 
over a field F: 


(a) V = { (x,y, 0) | z,y € R} 

(b) V = { (z, 1,2) | ;2 € Q} 

(c) V = {P(x) | P(x) is a polynomial with real coefficients} 

(d) V = { P(x) | P(x) is a cubic polynomial with complex 
coefficients } 

(e) V = { (x,y, 2)| 2, y, 2 2 0} 

(f) V= a: which are the rational 4x3 matrices 


373 


(g) V = Z?%, which are the integer 3x3 matrices 
(h) ¥ is the set of all polynomials with integer coefficients. 


2. Prove or disprove that the following vector subspace unions, 
U U FV, are themselves vector subspaces: 


(a) U = {(0,y,0)|y E€ R}, V = {(0,0, z)|z € R} 
(b) U= {(z,y,0)|z,y € R},V= {(0,0, z) | z E R} 


(c) U= {(z,y,0) |z, y € R},V= {(x,0,0) |x € R} 


3. Express the sum U + V of the following vector subspaces U and 
V as a single vector subspace: 


(a) U = {(0,a,0)|a E R}, V = {(0,0,) |b E€ R} 
(b) U = {(a,b,0) |a,b E R}, V = {(0,0,c) |c E€ R} 
(c) U = {(a,b,0)|a,b E R}, V = {(a,0,0)]a E R} 


(d) U = {(a,0,b,0,c,d,e,0) |a,b,c,d,e € R} 
V = {(0,a,b,c,0,0,d,0) |a,b,c,d € R} 
(e) U= a e aE Dre r 


V= {adj +b +cv3 +d |a,b,c,d ER}, 
for fixed vectors 0}, U2, U2, Uf, 03, T3, Tf € R°, where i, u € V 


4. Express the solutions to the following homogeneous systems as 
vector subspaces of R”; also state the dimension of each solution: 


(a) 32 — 6y + 5z = 0 (b) — 22+ 5y—7z =0 
-T + 3y —2z=0 z— 2y—3z=0 


(c) 2w-—3r4+4y+6z=0 (d) -w+r+y+z=0 
w+ 3r+y+2z=0 -2w+zr+y=0 
2w + 52 + y — 2z=0 


374 


5. Express the solutions to the following nonhomogeneous systems 
as vector subspace translates of R™; also state the dimension of each 
solution: 


(a) 32 — 6y + 5z = —1 (b) — 22+ 5y- 7z =2 
-x + 3y -2z =5 xz — 2y- 3z=3 


(c) 2w — 3x + 4y + 6z = —2 (d) -w+r+y+z=2 
w + 3z +y +2z = -1 —2w+r+y=1 
2w+5z2+y—2z=3 


6. Explain why the following sets § are or are not subspaces of the 
appropriate vector space R": 


(a) S={(a+b+5,2a+7b— 1,a—b+4)| a,b,€ R} 
(b) S = { (4a — b, —5a + 7b, a — 3b, —10a + 7b) | a,b, € R} 
(c) S= {(5,a + 3b,0,a — b, 7) | a,b, € R} 


(d) S = { (a + b,0, 2a + 76,0, a — b, 0,0) | a,b, € R} 


7. Show that UNY is a subspace of R” if both U and V are 
subspaces of R™. Does this generalize to the intersection of any 
finite number of subspaces? 


8. Find UNM ¥ for the following vector subspaces. 
(a) U = {(a,0,b,0) |a,b E R}, V = {(0,a, b,c) | a,b,c € R} 


(b) U = {(a,0, b, 0, c,d, e, 0) | a,b, c,d,e E€ R} 
V = {(0,a,b,c,0,0,d,0) | a,b,c,d € R} 


(c) U = {(a,0,b,0,c, d,0, e) | a,b,c,d,e € R} 
V = {(0, a,b, c,0,0,d, e) | a,b,c, d,e E R} 
9. Let U be the subspace of R", which is the solution space to the 


— 
homogeneous linear system AÈ = 0 and V be the subspace of R”, 
which is the solution space to another homogeneous linear system B 


=> 
= 0 . What homogeneous linear system has the subspace UM Y 


375 


as its solution space? Is there a homogeneous linear system that has 
the subspace U + ¥ as its solution space? 


Mathematica Problems 


1. Parameterize and then plot the vector subspaces corresponding to 
the solution spaces of the following homogeneous systems: 


(a) 32 — 6y = 0 (b) z — 2y =0 (c) 7+ 2y—3z=0 


(d) 22 -—-5y+7z=0 (e) 2ex-5y+7z=0 (f) 2e-—5y+7z=0 
z+ 2y —3z=0 r+y+z=0 
2. Parameterize and then plot the vector subspace translates 


corresponding to the solutions to the following nonhomogeneous 
systems: 


(a) 32 — 6y = -1 (b) z- 2y =3 (c) r +2y-3z=5 


(d) 2z — 5y + 7z = —2 (e) 2e-—5y+7z=1 (f) 22 — 5y + 7z =9 
z+ 2y- 3z = -2 r+yt+z=-2 


3. Plot the corresponding pairs of vector subspaces and their 
translates, from problems 1 and 2, respectively, together for each of 
parts (a)~(f). 

4. Solve the following systems of linear equations giving the 
solutions to both the corresponding homogeneous system and the 
given system. Write these solutions as linear combinations of 
vectors and/or the translate of a linear combination. Also, give the 
dimension of each solution: 


376 


(a) 62 + y — 2z + 8w + 2u — mv = -9 
3x —y+5z—2w+9u-—v=2 


(b) — 5x + y + 2z — 8w + u -v — 3r = —25 
22 —y +7z —-9w+u—4v+2r=11 
Tz +2y -2z -w+3u+v-r=5 


(c) Tx + 2y — 2z — w + 3u + v + 6r — 4s + 5t = 5 
3z — y +5z —-2w+9u-—v+r—s+t=-4 
6z +y — 2z + 8w+ 2u—7v—3r+s-—t=7 
—52 + y + 2z — 8w + u — v — r + 5s — 2t = —25 
(d) — 5r + y + 2z = -25 
3z — y + 5z = —4 
6x + y= 22 = 
Tr+2y-2z=5 
5. How are the dimensions of the solutions in problem 4 related to 
the number of equations and the number of variables in each 
system? 
6. Verify, with examples using Mathematica, your answers to 
homework problem 9. 


8.2 Independent and 
Dependent Sets of Vectors in 


As we have already seen, the general solution § to a homogeneous linear 


system Ag = 0 forms a subspace, and the subspace itself can be 
expressed as all possible linear combinations of just a few of the solutions 


377 


to the system. On the other hand, the general solution to a 
nonhomogeneous linear system AT = T does not form a subspace, 
although it can be written (as defined in the last section) as T = § + Yf, 
where uy is any fixed particular solution to the nonhomogeneous system. 
So T is a translate of S by UF. that is, T is parallel to S passing through 
uy while & passes through the origin. 

Now, we want to answer an important question about how all of this fits 
together. Can every subspace $ of R” be obtained as the general solution 
to a homogeneous linear system At = T , and if so, how? We already 


know that the general solution to a homogeneous linear system Ag =0 
is a subspace, and so it is logical to ask if the converse to this statement is 
also true. 


In order to answer this question, we first introduce the definition of a 
spanning subset K of a subspace § of R". 


> > = 
Definition 8.2.1. A finite subset K = {kı hay. ++) km }, of a subspace § 
in R”, is said to span S (or be a spanning set of ©) if every vector in S can 
be expressed as a linear combination of vectors in K. In other words, for 


each v E S there exists scalars a1 through am such that 
> > os 

VT = aki + agkg +++-+amkm 

K is a spanning set for the subspace § if and only if 


+ -> — 
= { ark, + agke + + Amkm 


01,43,...,4m ER} 


The next question to ask is: Does every subspace § have a finite spanning 
set K? Here R" is a subspace of itself and it has a finite spanning set U 
called the standard basis given by 


U = {(1,0,0,...,0), (0, 1,0,0,...,0),...,(0,0,...,0,1)} 


U consists of the rows (or columns) that form the n x n identity matrix. 
We will introduce the definition of basis in Section 8.3; however, for now 
a basis should be regarded as a smallest-size spanning set. One important 


378 


fact to keep in mind is that the vector space R™ has many other possible 
spanning sets; the standard basis given above is simply the easiest to 
construct and usually the most convenient to use. 


If every general solution (subspace) § to a homogeneous linear system A 


T = ? has a finite spanning set K, we need a method to find this set of 
vectors. The most obvious first step in finding this set is to row reduce the 
matrix A. But where to go from here? The remaining steps are as follows: 
Once the matrix 4 has been row-reduced, we will remove from the matrix 
all nonidentity and all zero columns, followed by negating them and then 
augmenting each vertically by the appropriate columns from the identity 
matrix that has square dimension equal to the number of nonidentity 
columns in the row-reduced matrix. This may seem like a complicated and 
strange set of instructions to follow; however, let us look at a quick 
example to see this process in action. 


Example 8.2.1. Consider the following homogeneous linear system: 

5 -7 14-9 3 ay 
-2 -1 60 -5 8/7=0 
11 9 -3 2 5 7 


A = {{5,-7,1,4,-9,3}, {-2,-1,6,0,-5,8}, {11,9,-3,2,5,7}};(R = RowReduce[A]) // 


MatrixForm 
13 157 268 
100 31 341 341 
-3 285 126 
0 1 0 31 341 341 
D ¢ 3 _289 565 
00 1 31 341 341 


The command Join allows you to append rows or columns to a matrix. We use it below 
to create column vectors Ei ; v , and w 
using the Row command: 


Row[{Transpose[{Join|—R[[A11, 4]], {1, 0, 0}]}] // MatrixForm, 


Transpose[{Join[—R[[A11, 5]], {0, 1, 0}]}] // MatrixForm, Transpose[{Join[—R[[A11, 
6], {0, 0, 1}]}] // MatrixForm}] 


and then output the vectors on one line 


379 


_13 157 _ 268 
31 341 341 
8 _ 285 _ 126 
31 341 341 
-3 289 _ 565 
31 341 341 

1 0 0 

0 1 0 

0 0 1 


The claim is that the general solution § to the given linear system sn = T above 

consists of all the linear combinations of the three vectors i ¥ 7, and i given by 
13 8 3 157 285 289 

= -=—,—,-—,]1, ’ v= ——,-——, ——,0,1,0 
( 31°31’ 31? o) (a 341’ 341 ) 
268 126 565 
Ñ = (-—,-—,-—,0,0,1 

341’ 341’ 341” 

To see this, ce = (z1 T233 Te) is treated as a column matrix, then we 


have the following system of equations corresponding to the rref matrix: 


13 
21 + =z -~ 


3174 ~ 34175 + 34376 = 9 
2 i +, +, =0 
- “a a” TE 
3 289 565 
Z3 + 2% Ts + —~-76 = 0 


(8.7) 31°“ 341 
Solving for the variables x1, x2, and x3 in the equations given in (8.7), we have 
13 adi, 157 268 i 
na” Si” a 
8 2 285 b 126 4 
31° Me Mi” 
ry = — 2g + 289, 565 
(8.89 > 31 *" 341° 341 
Remember that our solution was of the form T = (z1 T23- ++; Te), not T = 


(z 1, T2, T. 3) . We can express the solution given above in equation (8.8) as follows: 


Tı 
T = 


T6 


380 


zr —13 157 — 268 
l 31 341 341 
8 — 285 — 126 

T2 31 341 41 
r — 3 289 — 565 
3 31 341 341 

= T4 + 2X5 + Te 

T4 1 0 0 
T5 0 ] 0 
Tg 0 0 1 


But notice that this is simply 


T= PETA +250 +r Ùw 
v 


for the vectors a > © ,and w defined previously. Any arbitrary values of the 
variables x4, xs, and x6 in the definition of 2 iin satisfy 42-0 „and 
= 


therefore we have shown that solutions to se = 0 can be expressed as linear 


combinations of the vectors a i 7, and a. The subspace § is therefore spanned 


ian (A: Vow, 


Now let us explain why each subspace $ of R™ has a spanning set K. 
Consider the following argument: If Ẹ is a subspace consisting of just the 


zero vector, then we are done as it is its own spanning set, and K = { O }. 
—) 
=> 
LSF {0}, then let ky be any nonzero vector in S. We are again done if 
> > 
S is all scalar multiples of ki as then K = tk 1}. Assume that this is not the 


—> 
case, and let $1 be the subspace consisting of all scalar multiples of k 1, 
Then $ Æ Sı, and Sı defines a line through the origin in R”. Since S#S 
—> 


1, there is a nonzero vector, k2 in S, which is not in §1. If S is spanned by 


>> 

K= (k 1, k2}, then we are done. Assume that this is not the case, and let S 
> 2 

2 be the subspace consisting of all linear combinations of ki and k2, then 


S Æ S2. We now have that Sp defines a plane through the origin in R”, as 
opposed to the line through the origin given by $1. 


381 


You can continue this process, but not forever. For a second, let us 
consider the case that we have worked through the process above to the 

—) —> a 
{ky ke ka E 


TERET 


point that you have n + 1 vectors n+ }, where each 
successive vector in the set cannot be expressed as a linear combination of 
the previous set of vectors, and all are elements of §. As each one of these 
o> 

kj values is produced, we have also created subspaces, 5, ao R”, where S 
j is the set of all linear combinations of the vectors kis Kay... kj. Bach 
time we have done this, we have produced a higher-dimensional subspace 
of R” than before. Notice that S4 is a line that is of dimension 1, while S2 
is a plane and of dimension 2, and then finally S,+1 must have dimension 
n + 1. But R” has dimension n, and so how can a subspace of it, such as $ 
n+1, have dimension n + 1 > n? This is not possible, and so this process 
must stop after no more than n steps. Thus, every subspace S of R” has a 
spanning set K, and it does not need more than n elements if it is chosen 
by this process. Of course, this process works only if our vector space is of 
a finite dimension n, but will always work when considering subspaces of 
R” or C™. In this argument, hopefully you have realized that the subspace 

— 


S, must be R” itself, which implies that En+1 cannot exist as constructed. 


The process outlined above to explain why every subspace S of R™ has a 
spanning set K of at most n elements suggests the following definitions of 
independent and dependent finite subsets of R". Basically, we wish to 
eliminate from a spanning set K, of a subspace §, any element that is a 
linear combination of the rest of the elements of K since they are 
unnecessary in writing out § as linear combination of K’s elements. If no 
one element of K is a linear combination of the rest of K’s elements, then 
we say that K is independent, and otherwise we call it dependent 


Definition 8.2.2. A finite set of vectors K in R” is called an independent 
=} 


set if no single element W of K can be written as a linear combination of 
the remaining elements of K, that is, for 


K = {0}, 03,..., ve} 


=} 
K is independent if Yj Ẹ Sj, where the subspace S; of R” is spanned by 
the set 


=) => — —} 
{Vi U2,..-,Uj—1, Uj4iy-- +, Ue} 


382 


for all 1 Sj £ k. Another way to say this is that K is independent if the 
subspace § of R” spanned by K is not spanned by any smaller subset of 
K. 


Definition 8.2.3. A finite set of vectors K in R” is called a dependent set 


if at least one element w of K can be written as a linear combination of 
the remaining elements of K; that is, for 


K = {0}, 02, ... Uk} 


s r . p ES n 
K is dependent if for at least one j, Y3 j where the subspace 5 of R 
is spanned by the set 


—> —> ——)» —} — 
{Vi V2,- Uj-1, Uj+is -e Vk} 


Alternatively, K is dependent if the subspace § of R” spanned by K is 
spanned by some smaller subset of K. 
Both definitions above can be regarded in the following context. Consider 


—> =} pn l 
the set of vectors VI» V2» - - -» Uk} then the linear equation 


— 
(8.10) @1 Dy + a203 +--+ Hak = O 


: ; ; => a : 
will always be satisfied if aj= 0 for all 1 £j £ k. If this is the only way in 


which equation (8.10) can be satisfied, the set {0],02,---, Vk} is linearly 

independent. If we can find a set of scalars {@1,@2,--+,@} for which 

not all of the a; values are zero, which satisfy equation (8.10), then the set 
—> =} > 

of vectors V1, V2» - - +» Uk } is linearly dependent. 


Any finite set of vectors in R™ that contains the zero vector is 
automatically a dependent set as well as any finite set of vectors where 
one vector is a scalar multiple of another. The fastest way to check 


> => > 
whether a finite set of vectors {k ts Koy. + y kj } from R” is a dependent or 
independent set, is to find the rref of the matrix with these vectors as its 
rows. If you get at least one row of all zeros, the set is dependent; 
otherwise it is independent. The row of all zeros (if there are no row 
swaps using RowReduce) corresponds to a vector that is a linear 
combination of the previous row vectors; however, this is a somewhat 


383 


ambiguous statement since any vector involved in the linear combination 
can be solved for in terms of the others. 


Example 8.2.2. Now, let us check out how rref can tell us if a set of vectors is dependent 
om ff =) ale — \ 
or independent. Let v= (5, —7,3, 1, -2}, V = (—9,0,8, —4, 6) and 


w = -37 + 107 i in R: Note also that by the definition B, we have that 


l0 _ 1 La 3 
piak j L v= ww + $ U ciary, the set 
{u, v, w} 


u= {5,-7, 3, 1,-2}; v= £9, 0, 8, —4, 6}; w=-3 u +10 v 
{-105, 21, 71, —43, 66} 


is dependent. 


RowReduce[{u, v, w}] // MatrixForm 


8 4 2 
U. F f e 
ia co | a4 


8 2 
10-8 4 -2 
67 11 4 
0 1 -ë & -2 
00 0 0 0 


RowReduce[{w, v, u}] // MatrixForm 


— 


10-8 4 -2 
0 1 =- H ce 


2 
00 0 0 0 


— 


Notice that the order of the vectors 4. 7, and T as rows of the matrix 
is irrelevant to the rref matrix. In either case, we can conclude that this 
three-vector set is dependent in R5. The number of all-zero rows in the 
rref matrix indicates how many of the row vectors can be written as linear 
combinations of the rest. In this case, any one of these three vectors can be 
written as a linear combination of the remaining two vectors, as described 
prior to the example. 


384 


Example 8.2.3. Now, we do another example by adding a fourth linearly dependent 


vector to the set presented above. If we define ? = 27 = TT and apply rref 
Y. 
to all four vectors { uv sU, W 


bottom of the rref matrix: 


z} 

3 = J, we should get two rows of all zeros at the 
z=2u-7Vv 

{73, —14, —50, 30, —46} 


(T = RowReduce[{w, u, z, v}]) // MatrixForm 


8 4 2 
ao «| g -f 
67 11 4 
01-8 & “ï 
00 0 0 0 
00 0 0 0 


8 E 
1 0 -5 3 3 
-6&7 uU _A 
0 1 63 3 21 
-> 
The set of vectors { t, v , w iiad Feo is a dependent set in R; and they 


span the following subspace: 


S={a t +b +cÙ +d? |a, b,c,dE€ R} 


This subspace S is the set of all linear combinations of the vectors Ei to 7. 5 is also 
spanned by the set ct 7, me and the set ct » v }. Of these three spanning 


=} 
sets «Stu, W; w, a Faa T, 7, mA are dependent sets, while a 


v } is an independent set. Note that the independent spanning set et Vv; has the 


> 


fewest number of elements at two, and the dimension of the subspace spanned by as 


v } is 2 since Ei and v form a plane in Rs . It should also be noted that the two 


rows of the matrix L above also span the subspace 5, and are an independent set. 
The information presented above should make it fairly clear that, among 


the spanning sets K of a subspace S of R”, the spanning sets that contain 
only linearly independent vectors contain the fewest number of vectors. In 


385 


fact, the number of elements of an independent spanning set K for a 
subspace S seems to be its dimension. We will address this idea in greater 
detail in Section 8.3. 


We asked at the very beginning of this section whether every subspace § 
of R” is the solution subspace for some homogeneous linear system At = 
=> 

0 . Now, we can answer this question in the affirmative. If § = R”, then 
we can take the matrix A to be any zero matrix with n columns. IfS= {0 
}, the zero vector only, then we can take the matrix A to be the n x n 
identity matrix In. So what happens if S is a subspace of R” that is neither 
the zero subspace, or all of R”? 


Definition 8.2.4. Given a vector subspace $ of R", the orthogonal 
subspace (or perpendicular subspace) to §, denoted S+, is the subspace of 
IR" defined as 


os - R” > awe 
ase ={7eR I? v=0,vves} 


It is left to you to show that $+ is truly a subspace of R”. Note also that 
the orthogonal subspace to §+ is § itself, that is, (S*)*=§,. 


Notice that if § is a planar two-dimensional subspace of R: , then §+ is 
the one-dimensional subspace of IR? that is the line through the origin 
perpendicular to the plane S. Similarly, if S is the one-dimensional 
subspace of IR? that is a line through the origin, then S* is the 
two-dimensional subspace of R? that is the plane through the origin 
perpendicular to the line S. However, if $ is a planar two-dimensional 
subspace of R4, then §+ is the two-dimensional subspace of RR? that is 
the plane through the origin perpendicular to the plane S. 


z> = 
Since §+ is a subspace of R”, it has a finite spanning set K = ki, k2, ...., 
> 
kiy, Let the matrix A be formed by using the vectors in K as the rows of 


> 
A. Notice that this yields AT = 0 foreach F ES. Also, y is a solution 
= ae : sat os 
to Ag = 0 , then y is in § since it is orthogonal to each 
St = {(0,a,0,b,0} | a,b € R} 


of S+ giving Ej is in (S+)* =S, Thus, § is the solution subspace to Ad = 


and so v is orthogonal to each element 


386 


> 

0 . Therefore, to construct a homogeneous linear system corresponding to 
a particular subspace $ of R", we simply have to consider the orthogonal 
subspace to S. 


Example 8.2.4. We will first find the orthogonal subspace to 
S= { (a, 0, b, 0, c) ja, b,c € R} 


which is a dimension three subspace of R: . The orthogonal subspace consists of all the 


vectors in R 5 that are orthogonal to every vector in S, therefore 
S+ = {(0,a,0,b,0)|a,b€ R} 


is the dimension two subspace of R : orthogonal to S. To see this, we simply have to 
show that the following dot product formula is satisfied, for arbitrary real numbers a 
through e: 


(a,0,6,0,c) - (0,d,0,e,0) = 0 


Note also that the dimensions of the subspace and the dimension of the orthogonal 


subspace sum to five, the dimension of R 5 


Example 8.2.5. Let us consider a slightly more complex example. The subspace 
S = {a(1, —3, 2,5, —6, 4) + b(-9, 2, —1,7, 1,8) |a,b E R} 


which has dimension two in Rs, has as its orthogonal complement the subspace 


S+ = {a (1, 17,25, 0,0,0} + b (31, 52, 0, 25, 0, 0} + c {—9, —53, 0, 0, 25, 0) 
+ d (32,44, 0,0,0, 25) |a,b,c,d € R} 


The subspace s+ has dimension four in Rs, and is a linear combination of the four 
basic solution vectors: 


{wi = (1, 17,25, 0,0, 0), W3 = (31,52, 0, 25, 0, 0), 
w3 = (—9, —53, 0,0, 25, 0), Wå = (32, 44, 0,0, 0, 25) } 
corresponding to the system of two linear equations given by the two dot products 

(x, y, z, u,v, w) - (1, —3, 2,5, —6,4) = 0 

(x, y, z, u,v, w) - (—9, 2, —1,7,1,8) = 0 


387 


—> = —> — 
These four basic solutions Wi, W2, W3, and W4 to this system are a spanning set for 
this solution and can be found from the solution to the system written as column matrices 
(vectors). The solution as a column matrix is 


mw Me 31 = D 32 
ad 25 25 25 25 

17 52 53 44 
y 25 25 ~ 25 25 
z 1 0 0 0 

=z +u +v +w 

u 0 1 0 0 
v 0 0 1 0 
w 0 0 0 1 


—> =b =» —> 
where we get the four basic solutions Wis W2, W3, and W4 to this system by 


successively letting z = 25 and u = v = w = 0, u = 25 andz=v=w=0, v= 25 andz=u= 

w=0, w=25 and z= u = v — 0. We use the 25 in order to eliminate fractions from our 4 
—> —> —> —> 

basic solutions Wi, W2, W3, and w4 


A = {{1, —3, 2, 5, —6, 4, 0}, {—9, 2, —1, 7, 1, 8, 0}}; 
RowReduce[A] // MatrixForm 
l 0 -3 -3 3 -3 n) 
01 -5 -5 B -5 0 
Clear[x, y, z, u, v, w] 
X = {x, y, z, u, v, w}; 
U = Drop{A((1}], —1] ; V = Drop[A[[2]], —1]; 
Solve[{X.U == 0, X.V == 0}, {x, y}] 


{{x a 8 u-9v+32w+z), y > 562 u—53v+44 w+172)}} 
wi = {1, 17, 25, 0, 0, 0}; 

w2 = {31, 52, 0, 25, 0, 0}; 

ws = {—9, —53, 0, 0, 25, 0}; 

wa = {32, 44, 0, 0, 0, 25}; 

{U.w1, U.w2, U.ws, U.w4} 

{0, 0, 0, 0} 

{V.wi, V.w2, V.ws, V-wa} 

{0, 0, 0, 0} 


388 


Homework Problems 


1. Compute the spanning set K for the subspace S corresponding to 
the solution of each of the following homogeneous linear systems. 


1-21 8-2 2 4 => 
wli -2 s|2=7 ofi 1 -1 4|?-3 
0 6 1 4 0 6 
Gl 2 -2 -9 -1/2= ol 2 |2-7 
-1 5 2 -1 =] 5 


2. For each of the homogeneous systems from problem 1, determine 
the maximum number of vectors that could possibly be in the 
spanning set. How does this compare to the actual number of 
vectors in the spanning set? 


3. Determine whether each of the following pairs of spanning sets 
Kı and K2 span the same subspace: 


(a) Kı = {(1,0, 1), (0, 1, 1}}, K2 = {(2, —1, 1), (-1,5,4)} 

(b) Ky = {(1,0, 1), (0, 1,1)}}, K2 = {{2, —1, 2), (-1,5,4)} 

(c) K, = {{2,3, -1), (3,3, 1)}, Ke = {(1,0, 2}, (—1, —6, 8}, (2,0, 5)} 
(d) Kı = {(-1,0,1,0), (2, 1,2, 2)}}, Ke = {(1, 1,3,2), (3, 1, 1, 2)} 


(e) K, = {(2,3, —4, 1), (—1, 2, —1, 1}, (2, 1, -1, 1)} 
K: = {(1,3, —2, 2), (3,6, —6, 2)} 


(f) Ky = {(2,3, —4, 1), (—1,2, —1, 1), (2,1, —1, 1)} 

Kz = {(1,3, —2, 2), (3,6, —6, 3), (—5, 9, —8, 2)} 
4. Prove that given a vector subspace S of R”, S+, the set of all 
vectors orthogonal to every vector of S, is also a vector subspace of 
R”. 
5. Construct the spanning sets of the orthogonal subspace S* to the 
subspaces § defined by the following spanning sets: 


389 


K = {(—2,3)} 
(b) K = {(—2,3,3)} 


(c) K = {(—2,3, 3), (0,3, —7)} 
6. What is the orthogonal subspace to R”? 
7. The following two sets, Kı and K2, span the same subspace. 
Explain what this implies about the vectors of K2. 

= {(1,0, 1), (0,1,1)}, = {(2,—1, 1), (—1,5, 4), (1,4, 5)} 
8. Let S be a subspace of R”. What is S + S*? Explain your answer 


9. Let S be a subspace of R”. What is SO S*? Explain your 
answer. 


10. Let S be a subspace of R". What is the dimension of $+? 
Explain your answer. 


11. Find the orthogonal complement §+ for the following subspaces 
S, and give their dimensions: 


(a) S = {(0,0,a,6,0,c,0) | a,b,c € R} 


(b) S= { (a, 0, b, c, d,0,0,e) | a,b,c, d,e (a R} 


Mathematica Problems 


1. For each part of homework problem 2, place all the vectors from 
the pairs of spanning sets into one matrix. Row reduce the resulting 
matrix and interpret the results. 

2. Compute the spanning set K for the subspace S corresponding to 
the solution of each of the following homogeneous linear systems: 


390 


ae ae a a es es 

(a) k 2832 3|?-9 
a a a a e 

(bo) |4 283 2 3|/#=0 
5-49 112 2 
i oe ae S 
4-28 32 3 

© |5 59 ui 2 ?=0 
ELE emt 
1-21 8 O-1 
4-28 3 2 3 

@ j2 i3 7 8 ı|?=7 
ee oe 
1-2 1 8 0 -1 
42 8 3 2 3 

) |2 13 7 8 1/2%=+0 
2-11 310 O 
0 0 14 10 18 1 
Eer r E 
4-2 8 3 2 3 

M |2 1 3 7 8 1/#%=0 
2-111 3 10 O 


1-1 4 -12 -6 3 
3. Define S to be the subset of R7 given by 
S= {a (—1,3, 10, —5, —6, 4, —9) + b (9, 2, —4, 6, 1,8, —3) 
+ c(11,7,2,—8,0,—1,5) | a,b,c € R} 


Find the orthogonal complement S+ anda spanning set for it. What 
are the likely dimensions of S and $+? 


391 


4. Define § to be the subset of R? given by 
5= {a (—1,3, 10, —5, —6, 4, —9, 2, —7) 
+ b(9, 2, —4, 6, 1,8, —3, —5, 11) |a,b E€ R} 


Find the orthogonal complement S* and a spanning set for it. What 
are the dimensions of § and S+? 


8.3 Basis and Dimension for 
Subspaces of IR 


This section is the culmination of our discussion of subspaces, spanning 
sets, and the idea of dimension. In Section 8.2, we saw that the best type 
of spanning set K for a subspace § of R” is one that is an independent set 
since it eliminates unneeded elements of the spanning set, and so gives a 
smaller spanning set. 


Definition 8.3.1. Let S be a subspace of R™. Then a basis B of & is an 
independent (minimal-length) spanning set for S; that is, B is a basis for S 
if B is a finite spanning set for S and there exists no proper subset of B 
which also spans S. 


Theorem 8.3.1. Every basis B of a subspace § of R” has the same length, 
which is called the dimension ofS. 


Proof. We will argue that this statement is true using the power of rref Let 


—> = < 

S be a subspace of R™ with two bases Ky = {ti,U2,..-,Us} and K2 = 
=p =p -p ; : 

{0i,02,..., Um } of lengths s and m, respectively, with s > m. Now let us 


form the two matrices P and Q. where P is the matrix whose rows are the 
elements of Kı followed by K2, while the matrix Q is the matrix whose 
rows are the elements of K2 followed by K1 


392 


— SS A, 
P= = da 
- |) ¢*| 2 ee 
=} —> 


ITT 


Um — — ui — 


The first thing to notice is that both P and Q are of the same size (s + m) x 
n. If we apply rref to P, the matrix rref(P) has exactly m rows of all zeros 
since every element of K2 is a linear combination of the elements of Kı, 
and Ky is an independent set. On the other hand, if we apply rref to Q. the 
matrix rref(Q) has exactly s rows of all zeros since every element of K1 is 
a linear combination of the elements of K2, and K2 is an independent set. 
But we have already seen that rref gives the same matrix independent of 
the ordering of the rows of the matrix; thus rref(P) is equal to rref(Q). We 
therefore have a contradiction, since these two rref matrices have different 
numbers of zero rows. Hence, s = m, and any two bases of a subspace S 
must have the same number of elements. 


We can therefore conclude that a basis B for a subspace § is a minimal- 
length spanning set of the subspace S. The basis also gives the maximum 
possible number of linearly independent vectors that can be grouped 
together at any one time from the subspace S. We leave this last fact as 
something for the reader to verify. It should also be clear that if you have a 
spanning set K for a subspace S, then you can get a basis for S from K by 
finding rref of the matrix whose rows are the elements of K and then 
taking all of the nonzero rows as a basis for S. 


Definition 8.3.2. The row rank of A € R” *” is the number of nonzero 
rows in the matrix rref (A), and it is the dimension of the subspace § of R” 
spanned by the rows of the matrix A. 


Notationally, the dimension of a subspace § will be denoted dim(S) when 
actually performing a computation of the dimension of a subspace. The 
following two statements are equivalent: The dimension of § is k, and 
dim(S) = k. 


Definition 8.3.3. The subspace § of R” spanned by the rows of the matrix 
A is called the row space of A. 


393 


In a similar fashion, the column rank of a matrix A is the number of 
nonzero rows in the matrix rref (44), and it is the dimension of the 
subspace Qof R” spanned by the columns of the matrix A. This subspace 
Q is called the column space of A. 


Example 8.3.1. As an example, let us compute the row and column ranks for the matrix 


A given by 
-l1 5 29 -3 
A= 41-60 7 
36-4 9 4 


The row rank of A is the dimension m of the subspace 5 of IR spanned by the rows of 
A. For this example, 1 < m < 3. The column rank of A is the dimension n of the subspace 


Q of R 3 spanned by the columns of A, and it is the row rank of A" For this example, | 
< n = 5; however, as we shall see, n cannot be greater than the largest possible value of 


m (and vice versa): 
A= {{-1, 5, 2, 9, -3}, {4, 1, —6, 0, 7}, {3, 6, —4, 9, 43}; RowReduce[A] // MatrixForm 
_32 _3 38 
1 0 21 7 21 
2 12 
0 1 21 7 21 
00 0 0 0 


10 1 
0 1 1 
000 
0 0 0 
0 0 0 


Note that the row rank of the matrix A is 2, and so is its column rank. Is this a 
coincidence or not? Let us do another example to test things out. 


Example 8.3.2. This time, we will use Mathematica’s random number generator, via the 
RandomReal command, to construct a matrix whose entries are random values ranging 

from —1 and 1. Note that if you perform the following commands in Mathematica, your 

matrix A will look different: 


(RandMat = RandomReal{{ — 1, 1}, {5, 9}]) // MatrixForm 


394 


0.9914 0.79456 —0.75534 —0.66264 —0.30826 0.52101 —0.37524 —0.20102 0.25354 
0.05835 —0.44699 —0.25517 —0.34587 0.7368 0.24266 0.48944 —0.64963 0.33939 
—0.12618 0.034569 0.70435 —0.51803 —0.70793 —0.98889 0.90054 0.96371 0.67608 
—0.77160 0.58541 0.71097 -0.2906 0.34618 0.12112 —0.16342 0.58310 0.47748 
0.41568 -0.4062 0.34842 0.72675 —0.18709 —0.029196 —0.23208 —0.60710 0.73218 


RowReduce[RandMatl] // MatrixForm 


1 0. 0. 0. 0. 0.380637 —0.160502 —0.940333 1.4607 
0 1 0. 0. 0 0.67458 —1.03964 0.158187 0.045272 
0 0 1 0. 0. -0.204372 0.213229 -—0.0517747 1.68055 
0 0 0 1 O. 0.433682 —1.00084  —0.423137 -0.412551 
0 0 0 O 1 0.841274 -0.349708 -—0.927848 0.760785 


RowReduce [Transpose [RandMat]] // MatrixForm 


> 


eeoaoooocor 
-E-E-E -E-E-E 

cococoocoreS 
cocoocoocr SSS 
ccoooreoece 


For the random matrix A in Example 8.3.2, notice that both A and AT have 
the same row ranks, which also means that A has the same row and 
column rank, as both are equal to 5. This does not seem to be a 
coincidence, and it is indeed true that the row and column ranks of any 
matrix are equal! In fact, many times, the word row (or column) is 
frequently dropped from the front of the term rank. 


Definition 8.3.4. The rank of matrix A is the dimension of the row space 
or column space of A. 
Now, let us see why this works. Let A €E R” * * and 


Uy = (4j,1,44,2,-++5@isn) for 1 SiS m, be the rows of A expressed as 
vectors in R™. Next, if we let r be the row rank of A, then there is a basis 


—> —> > 
K = {vi, 02,.--, Ur} for the subspace § of R" spanned by the rows of 


the matrix A. We can now write each row vector “i of A as a linear 


395 


combination of the Vk vectors of the basis K. So for each row, we get the 
following expression 


> + +> -> 
(8.12) “i = biai + bj .202 + +++ + bi rtr 
for scalars bj, through b;, p for 1 Si & m. We will call these our row 
equations, and from these, we get a matrix B whose entries are the bj, ; 
—_ 
values. Now let each vector “J, from the basis K, be written as 


_> 
Vj = (Cj,1,C;,2,---,€ ; : 

(5,15 €),25-+ +4 Cjun) for 1 & j Sr. Then, looking at our row 
equations in terms of components gives us 


-> -> -> 
(Qi, 15 Qi,25 -+ -3 Qin) = bi 10i + bi,202 +--+ + bir Ur 
= bi 1 (C1,1, €1,25 see sC1i,n) a eddie bir (Cri Cr,25 sae Crn) 


for 1 £ i & m. Now, by equating components for each i, we get a system of 
equations for each column of A. For i — 1, we have 


a1, = b1,1C1,1 + b1,202,1 +++ + Oi rêr 
Q2,1 = b2,1C1,1 + b2,2€2,1 +++ + baer 


(8.13) 2m, = bm,1C1,1 + bm,2€2,1 z iae bm,rCr,1 


-—> 
So, if W1 is the first column of the matrix A, then our system of equations 
(8.13) above is 


=> -> — 
(8.14) w = c11 By + c21 Bo +--+ + cra Br 
—e 
where Bj is the jth column of the matrix B. Similarly, by equating second 
components for 1 £ i & m, we have 
= > => =} 
w2 = ¢,2B, + c2,2B2 +--+ + Cr,2Br 


+ 
where W2 is the second column of the matrix A. Thus, for the columns of 
; —> ==> —> 
A given by the set {Wi,W2,---,Wn} we have 


(8.15) w = c1 j B1 + c2 j B2 +--+ + cj Br 


396 


for 1 Sj Sn. Now we know that the vector columns of B, given by the set 


—> —> —> 
{Bi, Bo,..., Br } are a spanning set for the subspace QorR” spanned 
by the columns of the matrix 4. So we now know that column rank of A is 
less than or equal to the row rank of A. If we use A’ instead of A as above, 
then, by a similar argument, we get that the column rank of A’ is less than 
or equal to the row rank of A, or that the row rank of A is less than or 


equal to the column rank of A. Thus, the two ranks are equal. 


Theorem 8.3.2. Every element of a subspace § can be written in exactly 
one way as a linear combination of the elements of a basis. 


—> —> > : 
Proof. Let B = {Ui,U2,---,Us} be a basis for a subspace $. For an 
arbitrary we S, since B spans S, we have that 


(8.16) Ù = aÑ} + agua +--+ + asus 
for scalars ak, 1 S[k Ss. Now, let us assume that, for an instant, the vector 
lit can be written as a different linear combination of the basis vectors 


(8.17) P = bit] + batt} +--+ + bst 


for bk scalars with 1 Sk £ s. Now, if we subtract equation (8.17) from 
(8.16), we get 


(8.18) (41 — b1) it + (a2 — b2) T3 +--+ (as — b.) = T 


The independence of the basis B now tells us that all of these coefficients 
are 0, and so dj = bj for 1 Sj SS. Thus, there is exactly one way of 
writing each element w ofa subspace § as a linear combination in a given 
basis B of S. 


Theorem 8.3.2 is sometimes used as the definition of a basis, as it is 
equivalent to combining the properties of independence and spanning. 
One other interesting fact about how bases work is that if $ is a subspace 
of R” and has dimension k, then any linearly independent subset of S 
having k elements is automatically a basis of S. This also implies that any 
spanning subset of S having k elements is also automatically a basis of S. 
We now give an example. 


397 


Example 8.3.3. Define B to be the subset of R: given by the three vectors 


V= (—1,5,—2), y= (T, —4, 3) ana W E (10,6, 9) since 


there are three vectors in B, if B is an independent set, then it will be a basis of R 3. So 
by defining A to be the matrix whose rows are the three vectors from B, we can compute 
rref(A). If the result is /3, then B really is a basis. Here, A is explicitly written as 


-1 5 -2 
A= 7-4 3 
10 6 -9 
u= {1,5,2}; v = {7, —4, 3}; w = {10, 6, -9}; 
A= {u, v, w}; 


RowReduce[A] // MatrixForm 


1 0 0 
01 0 
0 0 1 


Since B truly is a basis, then we can find the unique scalars a, b, and c so that 


? = aw + bv + CB tor i = (8, 0, —5) Solving for the unknown 


constants, we have 


a 
-1 
b | =(AT) ¥ 
(8.19) L © 
where 7 is written as a column: 
s = {8, 0, -5}; 
Join [Transpose [A], Transpose [{s}], 2] // MatrixForm 


-1 7 10 8 
5 -4 6 0 
-2 3 -9 -5 


(RowReduce[Join[Transpose[A], Transpose|{s}], 2]]) // MatrixForm 


1 0 0 -3 


(ConstVais = Inverse[Transpose[A]].s) // MatrixForm 


398 


__ 266 

283 

_ 16. 

283 
211 
283 


266 16 2u 
So we now have that a = -283 b = -283 c = 283 solution below: 


ConstVals[[1]] u + ConstVals[[2]] v + ConstVals[[3]] w 


{8, 0, —5} 


Note that if we write the unknown constants a, b, c as a row vector (matrix of dimension 


1x3) instead, we end up with the equation 7 =: (a , b, c), We can solve for the 
unknown constants simply enough: 


(8.20) (a,b,c) = 8AM 


We have Mathematica perform the computations below. Notice that the solution remains 
the same: 


s.Inverse [A] // MatrixForm 


We conclude this section with two lists of equivalent statements. The first 
is a list of equivalent statements for “A set B of k elements is a basis of a 
subspace 8 of R™.” (Note that if k = n, then S=R".) 


A set B of k elements is a basis of a subspace Sof R". 


(a) B is an independent spanning set for the subspace S of R™. 
(b 


(d) B is a minimal size spanning set for the subspace Sof R". 


(e) B is a maximal size independent subset of the subspace S of R™. 


(£) The k x n matrix Mg, whose rows are the elements of the set B, has rank k. 


399 


The second list of equivalent statements is for “The square n x n linear 


=> 
system Ag = b hasa unique solution for T. a 


=> 
The square n x n linear system se = 6 has a unique solution for Zz. 


(a) The n x n matrix A has an inverse A`}, which gives the unique solution T = 
=> 
at 6. 


(b) det(4) É 0. 
(c) rref(A) = In. 


(d) The rank of A is n 


(e) The rank of A’ isn. 


— = 
(f) The linear system AQ = Q only has the unique solution T = 0. 
(g) The rows (or columns) of A form an independent subset of R™. 
(h) The rows (or columns) of A form a spanning set of R™. 


Homework Problems 
1. Determine which of the following sets are bases for R?. 


(a) {(2, 3), (2, 1)} (b) {(—2,3), (—3, 1)} (c) {(1, —2), (-3, 6)} 
(d) {(0, 2), (1, 4)} 


2. Determine which of the following sets are bases for R». 


(a) {(2,3, 1), (0, 2, 1), (—1,2,1}} (b) {{—2, 0, 1), (0, 2, 0), (0,0, 5)} 


(c) {(1, 1,0), (0, 1, 1), (1,0, 1)} (d) {(3, —2, 2), (1, —1, 0}, (—5, 3, —4)} 


3. Compute the row rank of the following matrices: 


400 


a ame | -3 -4 3 = E : 
(a)| -2 1 -2 (b)} 4 12 ()| 5 27 
08 1 5 -2 7 aor 
= P ke -3 -4 -3 4 3 0 
(d) (e)} 4 1 O] 4 #12 -1 
EA 5 -2 5 -27 3 
8 -6 4 - ” 


4. Determine the maximum possible row rank for each of the 
matrices from problem 3. 


5. Construct a basis for each of the following subspaces of R": 
(a) The set of all vectors in R3 of the form (@; b, a} 
(b) The set of all vectors in IR* of the form 
(a,b, —a, —6) 
(c) The set of all vectors in IR? of the form (a,b,a — b) 


(d) The set of all vectors in IR* of the form 
(a, 2b, a — 3b, 2a + 3b + c) 


6. Let C = (A|0) be the augmented matrix for the homogeneous 


> x . 
linear system At =0 , where A € R”, Now apply rref to this 
matrix C in order to read off the solutions to this system. You will 


find from rref(C) that the solutions T are of the form 

? = zp U + Ekt +--+ Ek, tp 

where xk}, Xk), ..., Xk, are arbitrary solution variables from T, and 
U1,U2,---,>Up € R” are p fixed solutions. Are the column 


vectors ui, u3, +++» UP automatically a basis of the subspace S of 
R" consisting of all the solutions T to the homogeneous system A 
T = T? Explain your answer in detail. 

7. Let U and V be two subspaces of R™. Prove that 


dim(U + V) = dim(U) + dim(V) — dim(U N V) 


> > + 
8. (a) Let S be a subspace of R" with a basis B = {U1> U2, -+ - , Up 
}, Explain in detail how you can find a basis for the orthogonal 
complement §+. 


401 


(b) What is the dimension of S~ in terms of n and the dimension p 
of §? Explain your answer in detail. 

9. For the following subspaces S of R”, find a basis for both $ and 
its orthogonal complement st, giving the dimension of each: 


(a) S = {(0,0, a, b,0,c,0) | a,b,c € R7} 


(b) S = {(a,0,b, c,d, 0,0, e) |a,b,c,d,e € R’ } 


Mathematica Problems 


1. Determine whether the following sets of vectors are linearly 
independent: 


(a) {(-1,2, 1), (—3, 4,0), (—1, 1, 1)} 

(b) {(—1, 2,0, 1), (-3, 4, 0, 2), (—1, 1, 1,0), (0, 2, 1,0)} 

(c) {(-1, 2,0, 1), (—3, 4, 0, 2), (—1, 1, 1,0), (3, —2, 1, —1), (—2, —1, —4, 2)} 
(d) {(-1,2,0, 1,0), (-3, 1,4, 0,2), (—1, 1,3, 1,0), (2, 1, 2, 1,0)} 


(e) {(—1, 2,0, 1,0), (—3, 1,4, 0, 2), (—1, 1, 3, 1, 0), (2, 1, 2, 1,0), (—3, 5, 9, 3, 2)} 


2. Compute the row rank of the following matrices: 


oe oe 
1 $ 1 =9 41 2 
—2 1-2 2 -4 6 7 
aj o gs 1 3]! 2 -5 8 
2-7 5 1 8 2 -2 
e E. 
=$ -4 3 3 
A a oss -3 -43 3 5 1 
(c) (da) | -2 303-32 $8 
2 T 8 4 1 -10 3 -3 9 -5 
0 -13 18 6 
-1 -10 13 5 


402 


af ag 

a! a ee ER E 4 1 
=. 2) 1. a8 <3 5 -2 
(e) | _4 60 6-4 6|®| -2 ı 
S 35: =e 6 eg 0 -2 

3 0 


3. Compute the row rank of the transpose of each matrix from 
problem 2 and compare your answers to those of problem 2. 


4. Let 
S = {a (2, —3,7, —5, —6, 4, —9) + b(—1,8, —4, 6, 0, 8, —3) 
+ c (9, -7,31, —19, —30, 28, —48) | a,b,c E R} 


Find a basis and the dimension for both § and its orthogonal 
complement S*. 


5. Let 
S = {a(—1,3, 10, —5, 7) + b (9, 2, —4, 6, —8) | a,b € R} 


Find a basis and the dimension for both § and its orthogonal 
complement §+. 


8.4 Vector Projection onto a 
Subspace of r 


Since we now know about subspaces, bases, and projections (section 6.4), 

we can combine all these concepts, and will now generalize the process of 

vector projection of one vector Y onto another vector W to that of 

Y v 

onto W resulted in a vector in the direction of W. The vector projection 
er 

projs( V) should be the vector of the subspace $, where Y = 


projs( T) is orthogonal to the subspace §; that is, every element w of S 


projecting one vector onto an entire subspace § of R”. Projecting 


403 


is orthogonal to v projs( T), In terms of dot products, this implies 
that for all W ES. 


(8.21) X : (V — proje(V¥)) = 0 


By the linearity of the dot product, it suffices that this formula holds for 


all W E€ B, where B is any basis of the subspace §. It may not seem 
possible that this condition will determine the vector projection 


projs(V) uniquely, but as we shall see next, it does indeed. 
Furthermore, we should also find that projs( T) = Y whenever Y € 
S 


We now derive the process for projecting a vector onto a subspace of R". 
From the previous section, we are guaranteed that the subspace S of R” 


has a basis B = (wi, T AF wk}. If we choose a fixed v E R”, then 


(8.22) W -(¥ — projs(¥)) = 0 


Applying the fact that t. (V-V)=7.7-?. w, we can now 
rewrite equation (8.22) as 


(8.23) Wi - projs (V) = W - V 


ne . 
for all Wi € B. Since projs(V) € §, it can be written as a linear 


combination of the vectors of the basis B. So we can define proje(V) as 


k 
projs( T = ai? + aos +e a,we = >> aw; 
(8.24) j=l 
for unique, and so far unknown, scalars a1, a2, ..., ak. Plugging this into 


equation (8.23), and using the linearity of the dot product, gives the 
equation 


404 


for 1 Si S k. We now need to determine the aj values that will satisfy 
these equations. To simplify matters, we will express the preceding k 
equations in matrix form. First, we explicitly write out the equations: 


aw} - Wi + aw -W + o + a e O = D e 
—> —> 
aii - Wi + agws - We +--- + apt WR = w Y 


—> —> —> —> > >_> 
(8.26) @1Wk - Wy + agwE We + -+ akk - W = Wk- V 


Next, we recognize that the LHS of the system given in equation (8.26) 
can be expressed as a matrix multiplication, so we now have the following 
expression: 


(8.27) 

wena ww wi - we wiv 
a) 

-b =b =) amb —> => =) 

w2? wi W22 wW2 w2? * Wk az m- Y 

—> => = = — —} ak — 

We Wy, We'Wo *'* Whe Wk D- Y 


So, now we see that if A is the & x 1 column (vector) of the aj scalars, then 
W A = V, where the k x k matrix W is defined entrywise by W; j = Wi - W3 


and the column matrix (vector) V is defined similarly by V; = Wi - v. 
This matrix equation is now simple to solve via matrix arithmetic. By 
using the inverse of W, we have A = wy. 

We will now perform some matrix manipulations to simplify the process 


of computing projs( 7), First, we note that we can express the projection 
as 


(8.28) projs( 7) = (a1, a2, tee ay) . (wy, W2,... , wk) 


Next, we will use the following fact: If t, T ER" are column vectors, 


the dot product 2 Y can be expressed as the matrix multiplication 


405 


=> 


uT V, where U and V are the column matrices corresponding to Wand T 
, respectively. Thus, the formula given in equation (8.28) can be expressed 
in terms of the matrix multiplication 


projs( 7) = [ a) @2 “° Qk ] = AT Mpg 


(8.29) 

where A is the column matrix corresponding to the column vector whose 

components are the aj values, and Mp is the matrix whose rows are the wj 
—> 

vectors. Now remember, we are given the Wj vectors, so if we can use 


7 


these vectors, along with the vector exclusively, in our formula, it 


would make the process of computing projs( V) programmatically 
a MT PE : 
simple. To this end we define C=M, B, explicitly given by 


=| 


As a result of (8.27), we know that A = wy. Furthermore, since W is the 
dot product matrix, we can express W as W = ch C, and in a similar 
fashion, Putting all of these facts together gives 


projs( V) = A’ Mp 
= (W-'!v)" cT 
=(c(w-v))" 

(c (00) oT?) 


—> 
w2 


c= |a 


(8.30) 


(8.31) 


Notice that this formula involves only the matrix C, and the vector Vv. C 
as defined in (8.30), is simply the matrix whose columns are the basis 


5) 


406 


<> . (a 
vectors “3, for 1 Sj £ k. If we wish to express projs( V ) as a row vector, 


notice that we can remove the outer transpose in the last line of equation 
(8.31): 


(8.32) Projs( V)” = C (CTC) CTH 


In the following examples, we will use the row form of the projection 
vector given in (8.32). 


Example 8.4.1. As a first example, we define § to be the subspace of R 3 with basis 
—> — 
B= {wi = (5, —2, 9), w2 = (—3,7, 1)}. Let us find and plot 


projs ( v ) for v -a (4, 8, = 15) We will use rref to see that Y is not in 5 


wi = {5, -2, 9}; w2 = {-3, 7, 1}; 
(wrow = {w1, w2}) // MatrixForm 


5 —2 9 
-3 7 1 


v= {4, 8, -15}; 

RowReduce[Join[wrow, {v}]] // MatrixForm 
1 0 0 
01 0 
00 1 


W = {{0, 0}, {0, 03}; 
For[k=1,k 22, k=k+1, 
For[l=1,12,1=1+1, 
WIIk, 1]] = wk.w1;]]; 

W // MatrixForm 


110 —20 
-20 59 
V = {0}, {03}; 


For[k = 1, k Å 2, k = k +1, V[IK]] = we.v3]3 
V // MatrixForm 


407 


—131 
29 


(A = Inverse[W].V) // MatrixForm 


_ 2383 
2030 
19 
203 


ProjvontoS = Sum[A[[k]] wx, {k, 1, 2}] 


-= 3048 -5) 
406° 1015’ 70 


N [ProjvontoS — v] 


{—10.1502, —4.99704, 4.52857} 


(v — ProjvontoS).w1 

0 

(v — ProjvontoS). W2 

0 

Let us now see that the formula (8.31) gives the same solution as the previous method. 


(CMat = Transpose[wrow]) // MatrixForm 


5 —3 
-2 7 
9 1 


CMat.Inverse[Transpose[CMat].CMat].Transpose[CMat].v // MatrixForm 


_ 2497 
406 
3048 
1015 
— 733 
70 


Everything checks out and v = projs ( v ) is orthogonal to the subspace S, 
which in this case is the plane through the origin spanned by the basis B as defined 


previously. Now we should plot all of this in R: to see it geometrically (see Fig. 8.3). 


Cp = Cross[w1, w2] 


408 


{-65, -32, 29} 
PlaneS = Cp[[1]] x + Cp[[2]] y + Cp[[3]] z == 0 
65 x-32 y+29 z == 0 


PlaneSPlot = ContourPlot3D[ Evaluate [PlaneS], {x, —12, 12}, {y, —12, 12}, {z, —12, 
12}, Mesh—None, ContourStyle— {Opacity [0.5], LightBlue}]; 


ArrowPlots = Graphics3D[{PlotPoints—1, Arrowheads[.05], Thickness[0.010], Red, 
Arrow[{{0, 0, 0}, wi}], Black, Arrow[{{0, 0, 0}, w2}], Blue, Arrow[{{0, 0, 0}, 
ProjvontoS}], Black, Arrow[{{0, 0, 0}, v}], Yellow, Arrow [{ProjvontoS, v}]}]; 


TxtPlots = Graphics3D[{Black, Text[“w1”, {5, —2, 10}], Text[“w2”, {-3, 7, 2}], 
Text[“Proj”, {—2, 2.5, —5}], Text[“v”, {4, 5, -10}], Text[“v—Proj”, {0, 5, -14}]}]; 
Show[PlaneSPlot, ArrowPlots, TxtPlots, PlotRange—All, AxesLabel—{“x”, “y”, 
“y 


Figure 8.3: The projection of a vector T ontoa plane in R?. 
10 r 


-10 


10 


10 


—> — 
In Figure 8.3, the two vectors WI and W2 form the basis for the plane. The triangle that 
has one edge on the plane has hypotenuse v ; the edge in the plane is the projection 
onto the plane of Y , while the edge normal to the plane is the difference between 


v 


/ and the projection. The distance from the terminal point of v to the plane 


409 


(T 
defined by È previously is the length of the vector Y _projg(v ) since it is the 


shortest distance from the point Y to any point of the plane S. This idea can be 
generalized to any subspace 5 and point v of R™. 
N[Norm|[v — ProjvontoS]] 


12.1863 


What we have done in the command above is compute the shortest distance from the 
point to the plane. If you recall, this idea was discussed in Section 6.4. Remember that 


any plane in R 3 has equation R : ax + by + cz = d, where the vector 


Tt = (a, b, c) is the normal vector to the plane. The shortest distance from a 
point P (xo, vo, Z0) to this plane should have the following formula: 


_ |axo + byo + czo — d| 
Ie os or 


Let us check this formula’s value for the plane § through the origin and parallel to both 


z= =} > 
WI and W2 and the terminal point of © ofthe last example. We can take the normal 


THT 4 
vector W to this plane to be W =W] x w2, which is denoted CP in the Mathematica 
code above and below. 


Abs[N[Cp.v/Norm [Cp]]] 
12.1863 


In doing this vector projection, we computed the vector projection 
projs(V) of T onto the subspace § using any basis B of §. One may 
wonder whether some bases are “better” than others for a vector space. If 
so, what properties would be desired for vectors in a “better” basis? 

—> --> —> 
Definition 8.4.1. A basis B = {Wi W2,---, Wk} ofa subspace § is said to 
be orthogonal if Wi - Wj =0 for all 1 Si#j Sk. 


—> => > 
Definition 8.4.2. A basis B = {Wis W2,---, Wk} ofa subspace § is said to 
be orthonormal if it is an orthogonal basis with the added property that 


: i> —> 
each vector has unit length, that is, jui = mw =1 for 1 SiSk 


As an obvious example, the standard basis for R™ is an orthonormal basis, 
and is given by the following set: 


{(1,0,0,...,0),(0,1,0,...,0),...,(0,0,...,0,1)} 


410 


Now one must ask the question: Why is this a “better”, or perhaps even 
“best”, basis to use? Relating this basis to our discussion of projecting 


vectors onto subspaces, notice that the k x k matrix W with Wij = w - w; 
is m k x k identity matrix J; if the basis B is orthonormal, and as a result, 
CĪC = Ig. This holds true not only for vectors chosen from the standard 
basis but also for vectors chosen from any orthonormal basis. The vector 


ET 


projection of V onto § simplifies quite a bit to 


projs( V) = (wy - Y) wy + (W3 - V) +--+ (Wk - V) we 


k 
= 2G V)w ) wj 


Referring back to the matrix expression of the projection formula, we see 
that under the assumption that C TC= Ik, equation (8.32) simplifies to 


(8.34) projs( 7) = (ccT?)* 


Formulas (8.33) and (8.34) are both very elegant and computationally 
simple as long as we can always find an orthonormal basis for a subspace 
S from any given basis B of S. In effect, what this states is that, given an 


orthonormal basis, any vector Y in the subspace can always be 
expressed as a linear combination of those basis vectors with the 
coefficients in front of each basis vector being the dot product of the basis 


vector with v. So the question now becomes: Is there a systematic 
process that takes a basis B and from it, generates an orthonormal basis? A 
procedure called the Gram—Schmidt orthonormalization process exists to 
do precisely this. We conclude this section with a theorem, whose proof 
will be left as an exercise. 

Theorem 8.4.1. Let S be a subspace of R". A basis B = 


—> => = 
{Wi, W2,.--,Wk} is an orthonormal basis of & if and only if for all 
vectors 


YES 


411 


Homework Problems 
1. Verify that if S is a one-dimensional subspace of R™ spanned by 
the single vector W, then the formula for projs( V) reduces to the 
formula given by proj (v ) 
2. Given the basis B = {w}, 2, a Wk} for a subspace of R”, 


— ==) 
prove that the matrix W € R* * © defined by Wij = TE wj 
satisfies W7 = W. 

—> =} — 
3. Given the basis B = W1, W2,---, Wk} for a subspace of C”, and 
—)> 
we C * $ defined by Wig = TA Wj, determine a relationship 
between W and W7. 


4. Project the following vectors on the subspace of R? generated by 
the basis {(1, 1,0), (0,0, 1)} 


(a) (—1,2,1) (b) (—1,2,0) (c) (0,2, 3) 


5. Modify the vector projection formula given in equation (8.31) for 
the case when the vectors are complex valued. 


6. Project the vector v= (—2 + 1,6 — i,3 + 2%) onto the 
subspace of Cc generated by the basis 
{(i,1,1+ 7%), (0,i,1 — i)} 


7. Prove that the dot product matrix W, when considering a set of 
orthogonal vectors, is diagonal. Furthermore, what do the values on 
the diagonal correspond to? 

8. Determine if each of the following sets of vectors constitute an 
orthogonal set: 


412 


(a) {(1,0, —1), (0, 1,0), (1,0, 1)} 

(b) {(1, 1,1), (2, —2, 2), (6,0, -6)} 

(c) {(1, 2, 1), (2, —2, 2), (6,0, —6)} 

(d) {(1, 1, —1, 1), (0, 1,0, —1), (—2, 1, 2, 2)} 


(e) {(1, I,-1, 1), (0, 1, 0, -1), (-3, 2, l, 2)} 

9. Convert each set from problem 8 that was determined to be an 
orthogonal set into an orthonormal set. 

10. Compute the distance from the plane spanned by the vectors 
{(1,0, 1) (1,2, —1)} to the point (8,8,1). 

11. Prove Theorem 8.4.1. 

12. Is vector projection linear? In other words, for a subspace S of 


RR" and two vectors re v € R” and scalar c, determine whether 
the following two properties hold. If the properties do not hold, give 
an example. 


(a) projg(cV) = cprojs(V) 
(b) projg(w + T) = projg( 7) aa projs( T) 


—> = > 
13. Let S be a subspace of R” with basis B = {W1 W2,.-., Wk} 


and let Y € R”. Determine the conditions on Y under which the 
following condition holds: 


projs( V) = proja (T) + proja (T) + --- + proja (T) 
14. What is (Projs( Y ))> 


—> <> — 

15. Let § be a subspace of R” where B = {Wi,W2,...,Wk} is a 
basis of S. Can we extend B into a full basis of all of R”? (Hint: 
Consider a basis of the orthogonal complement §+.) 


413 


Gai eo ee 
16. (a) Let S be a subspace of R” with basis B = (1) W2, W3f, 


—> —> => 
Show that the set B1 = {qi , 2, q3} is an orthogonal basis of S, 
where 


(b) Generalize part (a) to a subspace S of any dimension. 


17. Let U and V be two subspaces of R". For we R”, find 
conditions on U and V such that 


projy,v(w) = projy(w) + projy(w) 
18. (Fourier series) Let V be the real vector space of all continuous 


functions f : [0, 2a] — IR. Define the the dot product of two 
elements f(x) and g(x) of 


2a 


f(x) - g(x) = A f(x)g(x) dx 


Compute the vector projection projs( T), for Y =e€ V, and S, the 
subspace of V, having basis 


sin(x), cos(2x), ea sin(22)} 


{7 Fe meds Fe e) Fe Va 


(You should first check whether B is an orthonormal set.) Now graph 
Y rojs( T) rojs(T) 
together both WU and PFOjs\U) Is Projs a reasonable 


approximation of Y? How can improve this approximation? 


Mathematica Problems 


1. For each of the following sets of vectors, compute the dot product 


i > m 
matrix W, where W; j = Wi . Wj; 


414 


(a) {(—1,0, 2, 1), (1, —5, 6, 1), (2, 1,0, —3)} 

(b) {(—1, 0, 2, 1,0), (1, —5, 6, 1, —1), (2, 1,0, —3, —2)} 

(c) {(-1,0, 2, 1,0), (1, -5, 6, 1, —1), (2, 1,0, —3, —2), (0, 0, 2, 1, 0)} 
(d) {(1 + 4,1 — ii), (1 — 24,0, 1), (0,1 + å, 1 — 28)} 


(e) {(1 +i,1 — i, i), (1 — 27,0, 1)} 

2. Verify that W! = W for parts (a){(c) of problem 1. For parts (d) 
and (e) of problem 1, verify that W satisfies the property given as 
the solution to homework problem 3. 


3. For each part of homework problem 4, construct a graph similar 
to that depicted in Figure 8.3. Include the plane, the two basis 


vectors for the plane, the projection of the vector Y into the plane, 


along with the original vector v. 


4. Determine whether each of the following sets of vectors 
constitute an orthogonal set of vectors by computing the dot product 
matrix W: 


(a) {(1,0, —1), (0, 1,0), (1,0, 1)} 

(b) {(1, 1, 1), (2, —2, 2), (6,0, —6)} 

(c) {(1, 2, 1), (2, —2, 2), (6, 0, —6)} 

(d) {(1, 1, —1, 1), (0, 1,0, —1), (—2, 1, 2, 2)} 


(e) {(1,1,—1, 1), (0, 1,0, —1), (1,0, 1, 0)} 


5. Project the following vectors onto the subspaces generated 
by the given bases from problem 1: 


(a) (2,3, 1, 2) onto the basis from 1 (a) 
(b) (3,0, —6, —3) onto the basis from 1 (a) 


415 


(c) 2, —4, -8, —1,—3) onto the basis from 1 (b) 


( 
(d) (2, —4, —8, —1, -3) onto the basis from 1 (c) 
(e) (2, ~4, 8, —1, 2) onto the basis from 1 (b) 
(f) (2, —4, 8, —1, 2) onto the basis from 1 (c) 
(g) (1,9, 1) onto the basis from 1 (d) 
(h) (0, i, 1) onto the basis from 1 (e) 


(i) (1 — 2i, 1 + i, 2 — 2%) onto the basis from 1 (e) 
6. Using homework problem 16, find an orthogonal basis B1 for the 
subspace 8 of R5 having the following basis: 


B = {(-2,7,4, 1,9}, (6, 1, —3, —1, 7), (13,0, —8, 2, 9)} 


First, check that this set B is independent, and next check that Bı is 
orthogonal. 


8.5 The Gram-Schmidt 
Orthonormalization Process 


At the end of Section 8.4, it was pointed out that if we have an 
orthonormal set of vectors, vector projections onto subspaces become 


computationally simple. If B = {W1, W2,-.-,Wk} is a basis for a subspace 
S of R”, our goal in this section is to convert B to an orthogonal basis Q = 
(T, B. -- 9k} of S, where q : H =0 for all i Éj. Once we have a 
set of orthogonal vectors, to make them orthonormal, we simply divide 
each vector by its magnitude. The result is an orthonormal basis for & of 
the form 


p-{4,@ uf 
lail’ lgl lak] 


416 


In order to start the procedure for calculating the H vectors, we let qt = 
wr. Now, to obtain B, we first assume that B = alt + we, for some 
unknown scalar a. In other words, we wish to express B as a sum of two 
vectors, with the first in the direction of qi. So now we find the value of 


—> <-> 
the scalar a so that 41 and 41 are orthogonal. We arrive at the following 
string of equations: 


> > 
qi -2 =0 
—+ > 
qi - (aqi + w2) =0 
> >> => 
aqi -qi + qi -w2 =0 
Solving for a gives 
> — > — 
ai qı w2 _ qi'w2 
a 2 a2 
qi ` qi Ig 
Therefore 
-= 
> —> 41° wh 
q2 = w2 — z di 
-> 
lqil 
e >. => 
(8.35) = W2 — projz (w2) 


-> = , 
Now we have two orthogonal vectors {91, 92}, which span the same 


amj aii : : 
subspace as {W1, W2}. The idea that these two different bases span the 
same subspace is very important, so be sure that you understand why this 
is SO. 


-> EA E E S 
To get 93, we assume that 93 = @41 + bq2 + w3, and then find the values 


= — => 
of the scalars a and b so that 41 - 93 = 0 and 92 - 93 = 0. To solve for a 
we first 


1 


417 


qi 93 =0 

qi - (aql +b + w) =0 

aR R+- B+R m=O 
a- H+ -w3 =0 


Here we explicitly applied the fact that g1 and 92 are orthogonal to 
simplify the above expression. Solving for a gives 


= => > = 
ame qı W3  1°W3 
= -o =- 2 
qi’ 4 igl 


In a similar fashion, using the expression 92 - 93 = 0 allows us to solve 
for b: 


@ 9 =0 
B- (agi + bg + w3) = 0 
aR- R+- RHR W= 
WR- R+- m=0 
Therefore 
p=- B v _B ws 
ae qa! 


Now we have the values of the scalars a and b, and thus a complete 


, =} 
expression for 93: 


g-03- Sy Sey 
= 2 ==} )2 
lar |q2| 

-= . = : = 
(3.36) = W3 — proj (w3) — proją; (w3) 


418 


This process clearly follows a pattern, so we now have a systematic way 
of computing the jth vector in the orthogonal set. The formula is as 
follows: 


(8.37) 
—> => —> — 
at et = gt et ae ee 
( id =b? —>)2 ——>,2 el 
\qi| TA lgj-1l 
j-1 > —> 
z Wj- qi — 
aeii 2 i 
ii Ial 


for 2 <j < k, where 41 = W1. We can rewrite expression (8.37) in terms of 
“+ 


projections, similar to the definitions of 12 and 93: 


j-1 
—> _ => a 
qj = Wj — >= projg (%3) 
i=1 


(8.38) 


-> —> =». 
It should be clear that the subspace spanned by {91,92,---, 4} is the 
—> —> = ==> 
same as that spanned by {Wi}, W2,..., W5} , for each 1 Sj Sk. Normalizing 


— 
each JJ to unit length will give us our orthonormal basis P of the subspace 


S. 


Example 8.5.1. To illustrate this, we will redo the subspace projection example 8.4.1 
from Section 8.4. The basis for the subspace 5 was given by the set B = 
{(5, ~2, 9), (-3, 7, 1)} and the hee = (4, 8, = 15) 
be projected onto the subspace. We should end up with the same result, 

+ (a7) — (2497 3048 _ 733 
projs( v ) ( 406 * 1015’ TO / However, we first construct an 
orthonormal basis from our original pair of basis vectors: 
wi = {5, -2, 9}; w2 = {-3, 7, 1}; 
v= {4, 8, -15}; 


was chosen to 


qi = w1; 


q2 = w2 — (w2.q1/q1-q1) q1 
{ 23 73 29 
11° 11° 11 


419 


qi-q2 
0 

23 73 29 
(5, —2,9), (-#, Tl? #)} 
is an orthogonal basis for the plane S. Now, we can use the much simpler formulation of 
projection given in equation (8.33), assuming that we first divide both of the vectors in 
the set given above by their magnitude to create unit-length vectors: 


The last calculation above shows that the al 


p1 = qi/Norm[qi] 


(Væ -Væ m) 

22’ 55’ 4/110 

p2 = q2/Norm|[q2] 

(oo pe a E 
V6699' 6699’ V 231 


ProjvOntoS = Simplify[(v.p1) p1 + (v.p2) p2] 


jm 3048 =) 
406’ 1015’ 70 


The vector projection of Y onto 5 is the same, independent of the basis used to 


represent i. 


Example 8.5.2. Now we will do one more example, this time with vectors in R: f 
starting with the basis: 


B = {wi mas (1, —2, 1,0, 1), wa = (—2, —2,0, 2, 7), 
(4,1, 2,0, 3), w4 = (—1,0,0,0, 1)} 


We will implement the Gram-Schmidt orthonormalization to create an orthonormal basis 


w3 


P of the subspace § of R; whose basis is B. Pay special attention to the way in which 
we have Mathematica perform this. 

wi = {1, -2, 1, 0, 1}; w2 = {-2, -2, 0, 2, 7}; 

w3 = {4, 1, 2, 0, 3}; wa = {-1, 0, 0, 0, 1}; 

RowReduce[{w1, w2, w3, W4}] // MatrixForm 


420 


1000 -1 
0100 2 
0010 # 

31 
0001 # 


Notice that since the RowReduce command gives no all-zero rows, the set B is a set of 
linearly independent vectors. 


For[k = 1, k < 44k=k+1, 
qk = Simplify[w, — Sum[(w,-q;/Norm[q;]?) qj, {j, 1, k — 1})}; 
Pk = qk/ Norm([qx]3]; 
List{a,, q2; 43; 44) 
23 4 9 40 
fu, -2,1,0,1}, { 7 9? POP =}, 
{= 515 182 14 =} { 1707 58 181 1463 257 3 


173’ 173° 173° 173° 173’ 


List [p1, p2, p3, p4] 


1 21 1 a2 pa 
J- 4} 
th, 7 V7 7) - 1211 vm 173’ 2y aa} 
542 5 {33 14 
; 585945 °° ia7is0’ “V 52765 ' 685945" 685945 z} 


1707 _ „o [10 iy; 13 TE 
V17612530 58V T761253 | 354810 71463 ae) 


W = Constant Array [0, {4, 4}] 


7930’ 793° 610° 3965° 3965 


ir A g= 4k=k+1, 
For [jj =1,j< 4,j=j+1, 

Wik, il: = Simplify [px-p;];];]; 
W // MatrixForm 


421 


O O Om 
oo - © 
or © © 
= O O © 


Therefore P is an orthonormal basis of the subspace of R 7 corresponding to the basis B. 


As a side note, one can combine the two steps of the Gram-Schmidt 
orthonormalization process in a more integrated fashion. Instead of 


computing the Jj vectors first and then making them all unit length, we 
-> 

w 

wi 


can make 41 a unit vector first, by replacing 41 with , and then 


_ 
continuing on to 42 from equation (8.35), in which case we get 
>} 
> _ 41°W2-+ 
qg = v2--—S7 1 
qi | 
—> (=> >) => 
= w2 — (qi - w2) qi 


=> 
Now, we perform an intermediate step before computing 93, namely, 


—_)> 

making f2 unit length as well. So we perform the reassignment 
— 

a8 a 

Ta . We see that 83 from equation (8.36) is now given by 

>) O_o Ra 

q3 = w3 — (w3- qi) gi — (w3 - q2) 92 


This alternative method of the Gram—Schmidt orthonormalization process 
can be expressed as follows for each step in the process: 


T; “> Gy.) (part 1) 
1qj-1 
j-l1 
G=w3-) (W-PY) (part2) 
é=1 


422 


—b 
q = 
for 2 Sj Sk, where [et |. Finally, as one last step we must perform 


TS =? 
a normalization of k. As a complete process, we have 


Gi =wi (step 1) 
er 2< jk: (step 2) 
qj-1 > 
oo => Qj-1 
\aj—1| 
j-l 
G =m- (W u) E 
=} 
k 
i > oe (step 3) 
gk | 


Now we take the previous basis B and use the new method to produce the 
orthonormal basis P: 


Fork = 1,k < 4,k=k+1, 

Sk = Simplify [wk = Sum[(wx.s}) Sj, {j, 1, k — 1})]; 
Sk = sx/Norm|[s,];}; 

List[s,, $2, 53, sa] 


Me-i eah 
VT VTT VI 
S 23 2 2 9 14 a ha 

v2422° V 1211’ 2422 V 173° VTS’ 

ET of E on 

aw" 52765 685945 ait Te 685945 
1707 „o [| 10 [ 13 f- 2 {2 

{- aaa 1761253" 1! 1354810 71469 3806265 7 maa) 


The last command shows that either method for computing the 
orthonormal basis results in the same set of unit-length, mutually 
orthogonal vectors. 


423 


Homework Problems 


1. Explain what happens if one attempts to apply the Gram—Schmidt 
orthonormalization process to a set of vectors that is linearly 
dependent. It may be easiest to assume that K = 


> — > 

{t1, 02,..-, Uk, Test} and that 
k 

—> > 

Uk+1 = ` akk 
j=1 


for some scalars ax, 1 Sj £ k, of which at least one is nonzero. 

2. What happens when one applies the Gram-Schmidt 
orthonormalization process to a set of vectors that are already 
mutually orthogonal? 


3. Convert each of the following sets of vectors to an orthonormal 
set of vectors: 


(a) Kı = {(—2, 3), (6, 1)} 
(b) K: = {(1,0, 1), (0, —1, 1)} 
(c) K3 as {(1,0, l, 1), (0, —1, 2, 1), (3, 1,0, —2)} 


(d) K, = {(1, 1,0, 1), (2,1, —1, 1), (—2, —1, 1, 0)} 


4. Project each of the following vectors onto the corresponding 
orthonormal basis found in problem 3: 


(a) (1,1,1) onto K2 (b) (3,4, —2} onto Ky (c) (4, —5, —3) onto K2 


(d) (2,1,2,3) onto K3 (e) (3,5,—5,7) onto K (f) (3,5,—5,7) onto K4 


5. A square matrix P € IR” *” whose columns form an orthonormal 
basis of R™ is called an orthogonal matrix. Prove the following 
identities. (Hint: Consider the matrix multiplication PP") 


(a) P-'= PT (b) det(P) = +1 
6. Use problem 5 to show that any real matrix A of the form 


424 


1 a b 
A=- | | 
vat+b | —9 a 
is orthogonal if a’ + b° FO. 
7. Show that every 2 x 2 rotational matrix, given by 


| cos(8) te) | 
—sin(@) cos(@) 


for some angle 0, is an orthogonal matrix. 


8. Use problem 5 to show that if both P and Q are orthogonal 
matrices of the same size, then their two products PO and QP are 
also orthogonal. 


—> —> —> 
9. Let B = {Wi, W2,---,Wn} be any orthogonal basis of R”. Let S 
—> ==> > 
be the subspace of R™ with basis Bı = {Wi,W2,..-,Wk} for k < n. 
What is a basis of $+? 


10. Let B be any finite orthonormal subset of R". Prove that B is an 
independent set. Is this also true if B is merely orthogonal? Is any 
n-element orthonormal subset of R™ automatically a basis of R"? 


Mathematica Problems 


1. Determine which of the following are orthonormal sets of 
vectors: 


0 
( 928 132 133 136 )} 
V914793" 914793" /914793' 914793 


425 


2. Each of the following sets of vectors forms a basis for R". 
Construct the orthogonal matrix for each set and verify the 
properties of homework problem 5. 


(a) K = {(—2,5), (7,9)} 
(b) K = {(1,2,1), (—1,1,1), (1,0, 1)} 
(c) K = {(1,—2,1,3), (1, —1, 1, 1), (1,0, 1, 1), (2, 1, 1, -3)} 


(d) K = {(2, —2, 1, -2, 3), (1, —2, 2,1, —1), (—2, 3, 1, 1,0) 
(0, 2, 1, —3, 2), (2, 1, —1, —1, —2) 
3. A unitary matrix is the complex version of an orthogonal matrix. 


i i : ; ~1 DT 
A matrix P is unitary if P~* = P . Construct the orthonormal 
basis, given the following set of vectors, and show that the resulting 
matrix, whose columns are these vectors, is unitary: 


K = {(2 + i, 1 — 2i, 2 + 3i), (4 — i, 3, —3i), (2 — 3i, 2 + 4i, 3}} 


4. Write the vector v = (7, 1, -2, 9} as a linear combination of the 
possible or-thonormal sets in problem 1 (d) or (e), whichever is 
orthonormal if both are not. 


5. Write the vector J= (7, 1, —2,9, -5}) as a linear combination 
of the or-thonormal basis of R5 from problem 2 (d). 


6. Let f(x) and g(x) be two continuous functions for x in the interval 
[0,27]. We can define the dot product of these two functions as 


2r 
f(x) - g(x) = A f(x) g(x) dx 


Show that the set of trigonometric functions 


(= -= cos(x), r- sin(x), 7 cos(2z), 2 sin(22)} 


is an orthonormal set with respect to the given definition of the dot 
product. 


426 


7. Let ffx) and g(x) be two continuous functions for x in the interval 
[-1, 1]. We can define the dot product of these two functions as 


f(z) -o(2) = | H(2) ole) de 
Apply the Gram-Schmidt orthonormalization process to the set of 
functions 


{1,8,%°;2°) 


427 


Chapter 9 


Linear Maps from R” to R” 


9.1 Basics about Linear 
Maps 


In this section we will discuss the idea of linear maps, sometimes referred 
to as linear transformations. We begin with a definition. 


Definition 9.1.1. A linear map T is a function from IR” to R” that 
preserves linear combinations, and is denoted T : R” > IR”. Thus, for 
Ei ; v cR”, and real scalars a and b, the linear map T has the property 
that 

(9.1) Tlat +b V)=aT(@t) +57 V) 


First, we explore properties common to all linear maps. One simple fact to 
— 
recognize is that every linear map T takes the zero vector 0 » of R” to the 


= F : 
zero vector O m of IR”. To see this, notice that 


=T(0n) +T(0n) 
=2T (Ön) 


; > > 
which forces us to conclude that 7( 0 ») = 0 m. 


428 


Now we need some examples of linear and nonlinear maps in order to see 
what patterns might appear in the rules for such types of functions. 


Example 9.1.1. Let 7: R: —R have the rule rit Y, 2), = ax + by + cz for some 
fixed real scalars a, b, c. Then T is a linear map since for any real scalars æ and £, and any 


61.0.2 My, 2 

two elements ( 1M 21) and (v2, Y2, 22) of IR? we have 

T(a(ai,y1, 21) + Bre, yo, z2)) = T((ax, + Bxre,ay, + By2,az, + Bz2)) 

a(ax, + Bre) + b(ay, + Bye) + c(az, + 8z2) 
= a (ax, + by, + cz:) + 8 (axe + bye + cz2) 


aT ((21, 41, 21)) + 8T((x2, Y2, 22)) 


1i 


li 


Consider what happens if we change the rule for T slightly by adding a constant to it: T( 


T z ; 
? Y, ) =ax + by + cz + d for some fixed nonzero real scalar d. Then the new T is 
not a linear map since 


T(a (21,91, 21) + B (T2, Yr, 22)) F a T( (£1, y1, 21)) + 8 T((£2, yo, 22)) 


which is left for you to verify. This addition of d translates the original linear map so that 
-b 


it no longer sends 0 3 to 0. This new version of T is called an affine map since it is a 
linear map (the original 7) plus a nonzero real scalar d. 


Example 9.1.2. Let T : R: —> R: have the rule nts y )) = x(1,-5,2) + (-7,3,9). 
Then T is a linear map since for any real scalars a and B, and any two elements 


(ry 141 ) (xe, yo) TR we have 


T(a (21,41) + 8 (x2, y2)) = T((ax, + Bre, ayı + By2)) 
= (ax, + 8x2) (1, —5, 2) + (a yı + By2) (—7, 3,9) 
= a (xı (l, —5, 2) + yı (—7, 3, 9)) 
+ 8 (x2(1, —5, 2) + ye{—7, 3, 9)) 
= aT((r1,41)) + 8T((x2, y2)) 


If we change the rule for T to nts Y) = x(1,-5,2) +2/(-7,3,9) + (4,0,6), then the new T 
is not a linear map, but as in Example 9.1.1, the new T is an affine map since it is a 
translate of a linear map by the constant vector (4,0,6). 


429 


Example 9.1.3. Let T : R: —> R: have the rule „(T Y, 2), Z 


2 2 
5T — y + 27, z). 
( y + z+ y T ) Then T is not a linear map due to the squaring 


of the variables x and z in the first component of the rule for T. If this squaring was not 
present, then 7 would be a linear map. In fact, all linear maps must have as their rules 
something similar to Example 9.1.2 or equivalently, Example 9.1.4. 


Example 9.1.4. Let T : R: —> R: follow the rule 


T((x,y, z)) = (5z — y + 2,2 + 2y + 2, 7x — z, y + 4z) 
= z (5,1,7,0) + y (—1,2,0, 1) + z (1,1, —1, 4) 


= £ T((1,0,0}) +y T((0,1,0}) + z T((0,0,1)) 


If we treat these vectors as column matrices, then the rule for T can also be rewritten as 


ra 5zr-y+z 5 -1 P 

T y = x + 2y +z iz 1 2 y 

x Tz-z 7 0 - z 
y+4z 0 1 


Since matrix multiplication is distributive, this T is a linear map. 


From these examples, we can come to a couple of immediate conclusions, 
which we now present as theorems. 


Theorem 9.1.1. T: R” > Ris a linear map if and only if 


T((x1,22,-..,2n)) = 2, T((1,0,...,0}) + x2 T( (0, 1,0,...,0)) 
+--+ +2, T((0,0,...,1)) 


Second and equivalently, if we write our vectors as columns, we have the 
following theorem. 


Theorem 9.1.2. T:R” > R” is a linear map if and only if 


430 


Tı 
E ai T2 
Tilti Dasa @ A 


Tn 


where the m x n matrix A has as its columns the n vectors in R” 


T((1,0,...,0)),T((O, 1,0,...,0)),...,7((0,0,...,1)) 


and we are writing the output of T as a column vector in R”. 


There are two very important ways we can generate new subspaces of R” 
and R” from original subspaces of each using a linear map T : R”—-R 
™ by taking the image and inverse image of the original subspaces by T. 


The process called image uses T to take a subspace § of IR” and returns a 
subspace 7(S) of R”, while the one called inverse image reverses things 


and uses T to take a subspace K of R” and returns a subspace T~ K K ) 
of R”. 


Definition 9.1.2. Let 7: R” — R” be a linear map with S a subspace of 
IR”. Then the image T(S) of the subspace $ under T is given by 


T(S) = {T(¥)|V €S} 


which is a subspace of IR”. 


In order to see that T(S) is truly a subspace of R”, let T (@) iT ( T) € 
7(S) for P: v e §, and let a, b be two real scalars. Then we have aT cw 
)}+bT( V)=Tat +b V) e 11S), sineat +b T isan clement of S, 
and T is linear. Thus, 7(S) is a subspace of R”, 


Definition 9.1.3. 1f K isa subspace of R”, then 


431 


T~'(K) ={V¥ €R"| T(7¥) € K} 
is a subspace of R”, called the inverse image of the subspace K under 
the linear map T : R” « R”. 


In order to see that T K ) is a subspace of R”, let X, v E€ TH K ), 
and let a, b be two real scalars. Then we have that 


Tat +b V)=arÈ) +b V) 
is an element of K since T (@), T ( T) e K and K is a subspace of 
R”. Thus, T K ) is a subspace of R”. 
Example 9.1.5. We will now find the image 7() of the subspace 
S = {a(1,2,—3)| a E R} 
K 


and inverse image T K 


(9.2) 
K = {a(—1,4, —2, 1, —3) + b (4, —1, —6, —5, 4)| a,b € R} 


) of the subspace 


using the linear map T : R a R: given by 


(9.3) 
T((x, y, z)) = (5a — 3y + 22,2 — y — z, -7x + 2, 6y — 1lz,82 + y — 5z) 


First, note that 
T(S) = {T (a (1,2, —3))| a € R} 


= {aT((1,2, —3}) | a € R} 
= {a(-7,2,—10, 45, 25) | a € R} 


(—7, 2, —10, 45, 25) 


independent set, it is also a basis of 7 S, which says that 7( © has dimension one. 


so the vector spans the subspace T| S) and since it is an 


432 


In order to find T lg K ), we need to solve the equation 


e R 


(9.4) 


for x, y, and z in terms of a and b. To solve equation (9.4), we simply have to solve the 
following set of equations, which equates corresponding components of the left and right 


sides of the equation: 


Sr — 3y + 2z 
zr-—y-z 
-Tz +z 
6y — 11z 


8r + y — 5z 
(9.5) d 


In augmented matrix form, with the first three columns corresponding to the LHS of 


—a + 4b 
da — b 
—2a — 6b 
a — 5b 
~3a + 4b 


—a + 4b 
4a — b 
—2a — 6b 
a — 5b 
—3a + 4b 


system (9.5), and last two columns to the RHS, we have 


5 -3 2 
l -1 -~l 
-7 0 1 
0 6 -ll 
8 1 -5 
which row-reduces to 
1 000 
0100 
0010 
00 0 1 
0 0 0 0 


O O m= m= = 


-1 4 
4 -=l 
-2 -ô 
1 -5 
-3 4 


433 


Do not forget that there are two columns in the rref matrix above representing the RHS of 
equation (9.5). We find that the solution is given by x = b, y= b, z = b, anda=0. 
Therefore, we have that any vector of the form 


(x,y, 2) (1,1,1) 
K 


will satisfy equation (9.4). We can therefore conclude that T! 
and is given by 


) has dimension one 


T~'(K) = {b(1,1,1}| b€ R} 


1,1,1) 


(9.2), has basis {(-1, 


} is the basis for T lg K ). Remember that K , defined in equation 
4, =2, 1, -3), (4, -1, -6 


and A 


=F 
» 5,4) h An interesting item 


Li; 1), (4, -—1, —6, —5, 4) 
K 


to notice in this example is that 7( A , and since 


( 1,1 ’ l ) was the only element in the basis of T }( 


(x,y, z),_(-1,4,-2,1,-3) 


), the equation 7( 


has no solution. Therefore, if we defined 


Ke = {a(—1,4, —2, 1, —3) |a E€ R} 


=> a 
then ry K K 7 Ky_ (0), 


2)={ 0 3}. Note that for most subspaces 


It turns out that every linear map 7: R” — IR” can be represented as the 


multiplication of some mxn matrix A by vectors in IR”. The question is: 
How do we find the matrix A? The answer is actually hinted at in the 
previous examples. To determine A, we start by taking the standard basis 


S, = (F,R... E eis 
vectors ~" a a Talal te ni fR , which consists of the rows of the n 
x n identity matrix In, and plug them into T. The vectors 


misty miz p 
{T (s1), T (82), ---,T (Sn)} of IR” are the columns of the matrix A. 


We can verify that this matrix A works to give us the linear map T by first 
taking an element T eR” expressed as 


— , -> —> 
7 = (x1, 22,...,2n) = 2181 + L252 + +++ + InSq 


and plugging it into T. We get that 


434 


T(P) = 21T (51) + 2T (3) +--+ + £nT (Sa) 


n g= 7 > 
and so writing 7, T È), T (51), T (Sn) as columns, we arrive at the 
matrix equation 


2) =Ar 


where 


T (52) T (Sa) 


A= [re 
(9.6) 


Of course, the argument requires us to use the standard basis to determine 
A, which in turn requires us to know the image of each standard basis 
vector under the map T. The question becomes: What if we have a 
different basis for R” for which we know the image instead? Clearly, the 
argument above does not hold. However, the claim is that given a basis B 

—p =} -> 
_ {Wi, W2... Wa} of R”, if we know the values of T R, for 1<k<n, 
then we can determine A. Before we go through the process in an abstract 
fashion, we will consider a concrete example. Pay attention to the ideas, 
though, as they will be used in the general situation. 


Example 9.1.6. We will define T : R: = R: to be the linear map such that 


T({2,5, —1}) = (4, -9) 
T((-7,1,3)) = (1,6) 
T((4, -8,0}) = (—2, 10) 


We now wish to find the 2x3 matrix A such that 7( TA = PEA for all vectors T E€ R 
T((1,0,0)), T((0,1,0)) 4 


3, The columns of the matrix A are the vectors T 


T((0,0,1)) 


, in order. We must find out how to write the standard basis 


a {?, 7, E} = {(1,0,0), (0, 1,0), (0,0, 1)} 


435 


of R;, in terms of the new basis 
(0.8) {@, V, B} = {(2,5,-1), (-7, 1,3), (4, -8,0)} 


Now let us write down our problem in matrix notation. First, in most general terms, we 
wish to find the matrix A such that 


e(a- t l-i 


where (z, Y, z) (g R: and (a, B) E€ R: We will also require that 


(9.9) 


Furthermore, we also get the following equations as a result of the definition of the 
(7,7, E} 
standard basis tJ for R:. 


(9.10) 
>, | An >, _ | A a) _ | Aas 
r(@)=[ 4" |. 7) =[ 42 |. 708) =[ 42 | 
Also, the following expressions involving the standard basis will prove useful: 
— =b 
(2,5,-1)=2? +57 - k, 
— m 
(<7,1,3) m—-79 +f +3k, 


-= = 
(9.11) (4, —8,0) =4% — 87 


436 


Combining equations (9.9)—-(9.11) gives 


T ((2,5,-1)) =2T(7) +57(7) - T(k) = (4,-9) 
T ((-7,1,3)) = -7T(7) + T(7) +3T(K) = (1,6) 


j 
T ((4,-8,0)) = 4T(?) —8T(j) +0T(È) = (-2,10) 
77 7? 


Using the definitions of and 7( t SH J ),and 7( 
the previous three-equation system now becomes 


) described above, we see that 
(9.12) 
T ((2,5,-1)) = 2| D | + 5| he | - l 
T ((-7,1,3}) = -7| Au |+ l Ar | -3| 
raan al A -ela eelas] Le] 


Note that we have used the linearity of the map T to arrive at these equations. The system 
of equations can now be written in matrix form: 


(9.13) 
2 -7 4 
An An Ais 5s 1 -8gļl=-| 412 
Az, Arn Az -1 3 ~1|-9 6 10 


We write this symbolically as AM! = B, where 


D d 
pog 
O w 
— | 
li 
p 
a m 
a | 


2 5 -1 
M=]| -7 1 3 
4-8 0 


seems a natural way to define the matrix that right-multiplies A when looking at equation 
(9.12); hence the requirement of using M’ in the equation AM! = B. The columns of B 


are the vectors in R: corresponding to the image of the columns of the matrix MM. , Or 
the rows of M, under the map T. So now that we have our equation involving the 
unknown matrix A, with the known matrices M and B, it is simple enough to solve for A: 


437 


"=l 
(0.14) A=B(M") 


We will now perform the necessary calculations in Mathematica, also making sure to 
verify the independence of the vectors in the domain of T, hence making them a basis for 


R:. 

u ={2, 5, -1}; v = {-7, 1, 3}; w = {4, -8, 0}; 
M= {u, v, w}; 

RowReduce[M] // MatrixForm 


1 0 0 
0 1 0 
0 0 1 


(B= {{4, 1, -2}, {-9, 6, 10}})//MatrixForm 


4 1 -2 
-9 6 10 


(A = B. Inverse [Transpose [M]]) // MatrixForm 


2 25 8 
7 28 28 
Se ae M 
7 28 28 


Now, let us check that this 2x3 matrix A really is our linear map T through multiplication 
by A. 


A.Transpose[M] // MatrixForm 
4 1 -2 
-9 6 10 


Next consider the more abstract case, where B = 


basis IR”, with known values for v, = TÈ) e R” for 1 <j<n. Since 
we wish to express T as multiplication by a matrix 4 ¢ R” * ”, the 
following n equations must be satisfied: 


{i}, D... Da) i. 


438 


(9.15) AW; = Vj, 1<j<n 


We can put all n of these equations into the form AW = V, where 


=> 
Un 


We must make sure that the dimensions of the matrices allow us to 


(9.16) 


perform the preceding matrix multiplication. As previously stated, A € R 
m * ĦA and W consists of n vectors treated as columns, each of length n; 
therefore, W € Er. Similarly, V consists of n vectors of dimension m 
represented as column vectors, so V G R” *”, Since W is a square matrix 
whose columns form a basis for R”, it is invertible. Therefore, we can 
conclude that A = VW. It is quite easy to verify that this general 
argument applied to the previous problem once again yields equation 
(9.13). We use Mathematica to solve Example 9.1.6 with the more general 
approach: 


(W = Transpose [M]) // MatrixForm 


V= B; 
V.Inverse[W] // MatrixForm 


7 28 28 
_1 3% $9 
$ 28 28 


439 


Homework Problems 


1. Given a linear map T : R” — R”, expressed in terms of matrix 
multiplication by A? = 7, where A is invertible, does aT- T 
correspond to the inverse map T! : R” => R”? 

2. For each of the given linear maps, determine the matrix A that 
satisfies 7| (7) =A: 


(a) T({2,1)) = (0, 1) (b) T(({2, 1)) = (0,1, 1) 


T((1,2)) = (1,1) T((1,2)) = (1, 1,0) 

(c) T((1,0,1)) = (0,1) (d) T((0,0,—1)) = (1,1, 1) 
T((0,1,0)) = (1,1) T((1, 1, 1)) = (0,1, 2) 
T((1,1,0)) = (1,0) T((1,2,0)) = (0, 1,0) 


3. Compute the inverse map, T L to the maps from problem 2 part 
(a) and part (d). 


4. So far, we have considered linear maps from R” to R” where 
we know the image of a basis of IR”. Consider the following map: 


T((1,2,0,1)) = (—5, -1) 
T((-1,0,3, 2)) = (8,14) 
Explain why this map cannot be defined by a unique matrix A € R 


5. Construct two 2x4 matrices that satisfy the map given in problem 
4. 


6. Define Ap, and A¢ to be the matrices corresponding to the linear 
maps from parts (b) and (c) of problem 2. Verify the following: 
(a) A-A,(2, 1)? = (1,3)7 (b) A,Ap{1,2)7 = (1,0) 


7. Construct a linear map from R? to the subspace of R: given by 


440 


S = {(z,0, z)|z,z,€ R} 


8. For the linear map T in Example 9.1.4, find the image T(S) of the 
subspace 


S= {a(1,2,-3) + b(—4,5, 1)| a,b € R} 


and inverse image T L K ) of the subspace 


K = {a (5,4,6,5) + b (6,5,5,9) + ¢(—1, 2, —3,7} | a,b,c € R} 


Give a basis and the dimension for both the image 7(S) and the 
inverse image T K K ). 
9. Let T : R? — R? have the rule 


T((x,y}) = x (cos(@), — sin(@)) + y (sin(@), cos(@)) 


Show that T is a linear map and explain what it does geometrically. 
10. Let S be a subspace of R”, and 7 : R” € R” be defined by 7( 
T) = projs(V), fr Y eR. 

(a) Is the function T a linear map? Explain why if it is, but if it 

is not, give an example to illustrate why not. 

(b) If the function T in part (a) is a linear map, what is the 

image T R”? 

(c) If the function T in part (a) is a linear map, what is T \@,)? 
11. (a) Let T : R” e R” ana s : R” c R* be two linear maps. 
Show that their composite, S° T : R” e R4, is also a linear map. 
Recall that the composite function SÈ T : R” e RX has the rule (S 
© T V)=s(T( V)), forall Y eR”. 


(b) Let K bea subspace of R.. Explain why the inverse 
image is 


(s° nK) -=r' Ky 


441 


12. Let T : R” = R” be a linear map that has an inverse function 

T! : R” — R”. Prove that T! is also a linear map and n =m 

13. Let 7: R” — IR" bea linear map that has an inverse linear map 
'. R”— R”. Prove that if T can be written as the matrix 

multiplication T (T) = AT, then T! can be written as the matrix 

multiplication 7 -4 Y 

14. Let T : R” = R” be a linear map and B be a basis of R”. 

Show that T has an inverse linear map T l if and only if T(B) is also 

a basis of R”. This says that T is invertible if and only if T sends a 

basis to a basis. 

15. How can you define a linear map T : V — W, for V a 

subspace of R” and W a subspace of R”? Explain how the 


material of this section can be altered for this new more general 
linear map. Does everything about linear maps also work if we 


replace R by C and even mix R and C? 


Mathematica Problems 


1. For each of the given linear maps, determine the matrix A that 
satisfies 7| (7) =4r 


(a) T((1,-2,1,—1)) = (1,2,1,0) (b) T((—3, 0, 0, —7)) = (1, -2, 1, -1, 2) 
T((1,0, 1,1)) = (1,0, 1, 1) T((1,0, -1,0)) = (1, -3, 1,9, 0) 
T4, 2,1,0)) = (0,1,1,1) T((-3,2,1,5}) = (4,1, 1,8, 0) 
T((0,1,1,1)) = (1,-2,1,-1) T((0,—2,1,0}) = (1, -7, 1,0,3} 
© TU, ~2,1,-1)) = (3,2,—1) (d) T((-1,1,-1,-1,0}) = (1, -2, 1, -1,0) 
T((1,0, 1, 1)) = (1, -3, 1) T((1,0, 1, 1,0)) = (1,0, 1, 1,0) 
T((1, 2, 1,0)) = (7, 1,0) T((0, 1, 1,0, 1)) = (0,1, 1, 1, —1) 
T((O, 1, 1,1)) = (1, —2,6) T((1,1, 1,0, 1)) = (1,2, 1,0, 1) 
T((0,0,1,0,0)) = (1, -2, 1, -1, 1) 


2. Compute the inverse map to those maps that are invertible from 
problem 1. 


3. Consider the following two maps, S and T, defined as follows: 


442 


$((3, -2, 1, -1)) 

S({—2,0, 1, 1)) 
$((1,2,1,0)) = 

$({0, —4, 1, 5)) 


= (1,2,1,-5,6) T((7,—2,1,—1,3)) = (—3, 2, —1, 2, -2, 2) 

1,0, 2, 1,3) T((1, 0, 2, 1,3)) = (1, 1,0, 1, 1,1) 

0,1,-3,1,4) — T((0,1,-3,1,4)) = (-1,2, 1,0, 0, 1) 

= (7,-2,1,-1,3) 7((1,2,1,—5,6)) = (1,-1, 1,2, -1,-1) 
T((4, -2, 2,0,3)) = (1,5, -2, 1,9, -6) 


Il 


(a) Compute the matrices As and Ar corresponding to the maps 
S and T, respectively. 


(b) Verify that the composition of the maps S and 7, given by 
Ar As takes each vector of R4 in the definition of S to its 


corresponding counterpart vector in R6 found in the range of 
the definition of T. 


4. Verify your answer to homework problem 8. 
5. Let T : R5 — R3 have the following rule: 
T((x,y, z u, v)) = (5a — y + z + 2u — v, 3t + 2y +z -u +v, 
Tr +y + z — 6u + 9v, T — 4y + 10z + 13u — 11v, 


— T + 3y — 9z + u + v, —3x + 6y + 8z — Tu + Gv, 
— 2x + 15y + z — 10u + 2v, 17x — 3y + 142z — 5u + 8v) 


Find the image, T(S), of the subspace 


S = {a (1,2, —3, —1, —6) + b (—1, 5, —12, 17, —9) 
+e(1,—1,2,-5,-1) +d (1,0,1,0, —1) | a,b,c,d € R} 


and the inverse image, T K K ), of the subspace 


K = {a (5,6,9,7, —7, 11, 14, 28) + b (2, 1,4, 12, —7,7, —7, 17) 
+ ¢(—4,2,1,6,-1,3, —5,9) | a,b,c € R} 


Give a basis and the dimension for both the image, 7(S), and the 
inverse image, T7 K K ). 

6. Determine which of the linear maps given in problem 1 have 
inverse functions, and find the matrix B that represents these maps 


as matrix multiplication. Show that this matrix B is the inverse to 
the matrix that represents T. 


443 


9.2 The Kernel and Image 
Subspaces of a Linear Map 


From Section 9.1, we know that linear maps take subspaces of R” to 
subspaces of IR”. For a linear map 7: IR” — IR”, the most important 
two subspaces of IR” and IR” are the kernel of T, denoted Ker(7), and the 


x : Uy n 
image of T, denoted Im(T), corresponding to T (0m) and 7(R ), 
respectively. 


Definition 9.2.1. The kernel of a linear map T : R” — R”, denoted 
Ker(7), is defined as 

n pere —) = va, 

Ker(T) = {v ER"|T(7) = Om } = T- (ðm) 


Definition 9.2.2. The image of a linear map T : R” — R”, denoted 
Im(T), is defined as 


Im(T) = {T (X) |X € R"} = T(R”) 


The first thing to notice is that the kernel of T is a subspace of R”, while 


the image of T is a subspace of IR”. These two subspaces contain 
important information about the nature of the linear map T. 


Example 9.2.1. Let us do an example of computing the kernel and image of the linear 


map T: R- RR? siven by 


T((x,y,2z,w)) = (£x — y +z +w, 3x + 5y + 8w,r + 7z + 8w, 
z +y + 2z + 4w, 6y + 6w, x + 2z + 3w, -8r +y + 112z + 4w) 


We first find the 7x4 matrix A that represents T as a multiplication; then we solve the 
=> — 
system A U = 0 7 to get the kernel of 7. The image of T is the subspace of IR y 


spanned by the columns of the matrix A. We will find a basis for the image of T by 
applying RowReduce to A! and taking its nonzero rows: 


444 


Tlx_, y-, z-, w-] := {x — y +z +w, 32+ 5y + 8w, 2+ 7z + 8w, e+ 
y + 2z + 4w, 6y + 6w, x + 2z + 3w, —8x + y + 1lz + 4w}; 
T[-3, 5, 1, —8] 


{—15, —48, —60, —28, —18, —25, 8} 


(A == Transpose [{T[l, 0, 0, 0], T[0, 1, 0, 0], T[0, 0, 1, 0], T[0, 0, 0, 1]}]) // 
MatrixForm 


l -1 1 1 
3 5 0 8 
1 0 7 8 
1 1 2 4 
0 6 O 6 
l 0 2 3 
-8 1 il 4 


RowReduce[Join[A, Const ant Array [0, {7, 1}], 2]] // MatrixForm 


1001 0 
0101 +0 
00110 
00000 
00000 
00 0 0 0 
00 0 0 0 


The rref matrix given above tells us that the kernel of T can be expressed by the equation 


x =] 
ee ee 
S a E 
w 1 


445 


for w arbitrary. So Ker(7) has dimension one, with basis vector 


( 1 ~ l g 1 ? 1) . Now let us find a basis for the image of T. Before we do this, 
however, we should have a strategy. It was easy to compute Ker(T), since we simply 
determined the spanning set for the homogeneous solution. To determine the image of T, 
we first write the map as follows: 


(9.17) 
T((x, y,z,w)) = z (1,3,1, 1,0,1, -8) + y(—1,5, 0, 1,6,0,1) 
+ z (1,0,7,2,0,2, 11) + w (1,8, 8, 4, 6, 3, 4) 
=rit+yr+zrntunr 


Then the matrix A that turns T into multiplication by A is the 7x4 matrix with columns 
-b =b =b => 
Fi T2 T34 


—> 
A= |7T1|72|73|74 


Also, rr, E€ rR for 1 <j <4 since 
T((1,0,0,0)), F3 = 7((0,1,0,0)) 
T3 = T((0,0, 1,0)) and ži - T({0,0,0, 1)) . The set { 


FÌ 


combination of the 7 j vectors. To determine the spanning set corresponding to the set 


— > 
of F j vectors, we simply transpose the matrix 4A, whose columns are the r j vectors, 
and row-reduce. The result is given below: 


(ImT = RowReduce [Transpose [A]]) // MatrixForm 


4 42 25 356 
100 3 -F & oS 
11 12 5 61 
0105 7 3 73 
14 6 11 131 
0015 7 35 St 
0000 0 0 0 


From the matrix JmT above, we see that the subspace Im(7) has dimension three, with the 
first three rows of the matrix /mT as a basis. Thus, any vector in Im(7) can be expressed 
as a linear combination of the three vectors: 


446 


4 42 25 356 11 12 5 641 
{(100,4,-2, 3,2), (10,22, 8,2), 


14 6 11 131 
(0,01, 50 ae Sr) 


Note that the dimension of the kernel of T added together with the dimension of the 


image of T is four, which is the dimension of A the domain of T. Is this a coincidence 
or a general fact? If it is a general fact, then why is it true? 


Happily, it is true in general. If 7: R” — IR” is a linear map that can be 


represented in matrix form by T| (7) = AD), then the dimension of Im(7) 
is equal to the column rank of A, which is equivalent to the number of 


dependent variables in solving At = T , which is how we compute 
Ker(T) in the first place. This leaves us to assume that the number of 
independent variables must be the dimension of Ker(T), since the number 
of independent and dependent variables must sum to n. 


Theorem 9.2.1. Given a linear map T : R” — R”, Im(T) and Ker(T) 
satisfy the property: 


dim(Im(T)) + dim(Ker(T)) = n 
Proof. Let T : R” > R” be a linear map. Then Ker(T) = Tn) isa 


subspace of R”. its orthogonal complement, Ker(7)", is also a subspace 
of R”, where the sum and intersection of the two subspaces satisfy 


any Ker(T) + Ker(T)+ =R”, Ker(T) N Ker(T)* = {On} 


Hence, if we have any basis { Vi, Ya... To of Ker(T), and any 


basis{ W1, W2... Wp} of Ker(7)+, then the union, { Vi, Ta... CART 
L2... Wp} is a basis for R” with k +p=n. 


Now, if we take any element We R”, we can express it as 


U = aTi +a +--+ + aT + b + bods + --- + bps 


447 


for real scalars aj and bj, 1 <i < k and 1 < < p. Then, plugging T into T, : 
we get 


T(P) = aT (Wt) + aT (È) +--+ + akT (È) 
+ by T (Wt) + bT (T3) +- + bpT (wp) 
= biT (wt) + baT (wa) +--+ + bpT (wp) 


— 
since { Vi, >... To c Ker(T) says that T ( Y) =0 m, for 1 <j <k 
So the set {T (1) , 12),..., T (R p)} spans Im (T R”). 
We will be finished if we can show that this set is also independent, 


implying that {T R1) , 7(%#2),..., T (tp)} is a basis of Im (T R”) with 
dimension p. Consider the following linear combination: 


7 5 se aS 
(0.19) PIT (wi) + baT (02) +--+ + bpT (Wp) = Om 


for scalar bj values, where 1 <j < p The set {T (w 1) : TW)... T Œp) 
is independent if all b; values must be 0 in order to satisfy (9.19). We 
know that 
(9.20) 

biT (T1) + beT (W3) + --- + bpT (©) = T (b, Wy + bow3 +-+- -+ bpw,) 


and substituting this into equation (9.19) gives that byw + bats + ...bp 
B, € Ker(T); thus bw + bouts Fx bp p € Ker(T) ? Ker(7)+, which, 
by the second identity given in equation (9.18), implies that the linear 
combination is the zero vector of 


biw; + bows + bpp = T, 


Since Ria... Wp} is an independent set, bj = 0 for 1 <j <p We can 
now conclude that dim(Im(7)) + dim(Ker(7)) = n. 


In the last part of this section, we will relate the kernel and image of a 
linear map 7 to whether the linear map is either a one-to-one or onto 
function. This will be very important in deciding whether a linear map T 


448 


has an inverse linear map T l since a function only possesses an inverse 
function when it is both one-to-one and onto. 


Definition 9.2.3. A function F : D — R, with domain set D and range set 
R, is said to be one-to-one if whenever F(x) = F(y) for x,y € D, then x = y. 


Definition 9.2.4. A function F : D — R is said to be onto if F(D) = R, that 
is, if R= {F(x) |x e D}. 


Definition 9.2.5. A function F : D — R is said to be bijective if F is both 
one-to-one and onto. 


A one-to-one function F is one that never repeats its range values, while 
an onto function F is one that attains all of its range values. If we consider 
real-valued functions, with D,R c R, then F : D —> R is one-to-one if any 
horizontal line crosses its graph at most once, and onto if each horizontal 
line y =r, for r € R, crosses the graph of F at least once (see Fig. 9.1). 


Figure 9.1: The function sin(@) is (a) onto, (b) one-to-one, and (c) 
bijective. 


sin(@) 
| | a 
0 8 0 6 8 
n R a a 
r 3 


As examples, sin : [0, 2n] — [+1,1] is not one-to-one, since sin(0) = 
sin(27) but it is onto since for each y e [-1,1], there is an angle 6 € [0, 27] 


with sin(@) = y. Now if we alter this to sin : [0, 2 ped 1,1], then it is 


one-to-one but not onto. Finally, if we define sin : [- 2, 2 ‘| — [-1,1] then 
our function is bijective. If you are familiar with the definition of arcsin, 
then you will notice that the domain and range in the final example are the 
principal range and domain, respectively, for arcsin. Also, note that in the 
second definition, if we had limited the range to [0,1], the function would 
have been bijective. The graphs in Figure 9.1 illustrate the three different 


situations of onto, one-to-one, and bijective. For a linear map T : R” — 


449 


R”, one-to-one and onto are related to its kernel and image subspaces, 
respectively. 


Theorem 9.2.2. A linear map T : R” — R” is one-to-one, if Ker(T) = { 
0,,}, and it is onto fim- R”. 

Proof Another way of saying this is that T is one-to-one if dim(Ker(7)) = 
0, and onto if dim(Im(7)) = m. Let us look at one-to-one first. If 71 (7) E 
RED for 2Y € R”, then T( P) - T( Y) = Tn. However, since T is 
linear, this gives 7( T- V) = On. We can therefore conclude that a — 


> 
€e Ker(T7). So T is one-to-one exactly when Ker(T) = {0 n}. The onto 
condition is automatic from the definition of onto and Im(7). 


Note that the linear map T : R- R’ given in Example 9.2.1 is neither 
one-to-one nor onto. 


Theorem 9.2.3. A linear map T : R” > R” is both one-to-one and onto 
if and only if dim(Ker(T)) = 0 and dim(Im(7)) = m. 


From Theorem 9.2.1, if T is both one-to-one and onto, then m = n. It does 
not work the other way around, since if m = n, we do not know anything 
about whether T is one-to-one or onto. It does follow though, that if m = n 
and T is one-to-one, then T is onto. Similarly, if m = n and T is onto, then 
T is one-to-one. 


Theorem 9.2.4. A general linear map T : R” + R” is both one-to-one 


and onto if, and only if, m = n and the matrix A € R”*” representing T 
is invertible. 


This follows from our discussion above and the fact that T T) =A? 
allows us to do some simple algebra: Solving AT T gives Ker(T) = { 
Tn, for a square matrix A, exactly when A has an inverse. Also, At = 

has exactly one solution for each y € R”, for a square matrix A, 


exactly when A has an inverse, which gives the image of T is R” exactly 
when A has an inverse. 


450 


Corollary 9.2.5. A linear map T : R” > IR” is one-to-one if, and only if, 
for any independent set of vectors K - w,. ; Wig of R” we have that 
T K ) is an independent set of vectors in R”. 


The next corollary follows if K is a set that spans all of R”. 


Corollary 9.2.6. A linear map T : R” — R” is one-to-one if and only if 
T sends a basis of R” to a basis of the image T(R”). 


Next is a corollary involving the onto property. 


Corollary 9.2.7. A linear map T : R” > R” is onto if and only if for any 
spanning subset Š of R”, we have that T(S) is a spanning subset of IR". 


Combining Corollaries 9.2.6 and 9.2.7 yields the following corollary. 


Corollary 9.2.8. A linear map T : R” > R” is bijective if and only if T 
sends a basis of the domain R” to a basis of the range R”. 


In terms of matrix representation, Corollaries 9.2.5-9.2.7 yield the 
following corollary. 


Corollary 9.2.9. [fA € R” *” is the matrix representation of the linear 
map T: R” | R”, then T is onto if rank(A) = m, and one-to-one if 
rank(A) =n. 


Homework Problems 


1. Each of the following matrices represent a linear map T : R’ 


R”. Compute both the Im(7) and Ker(T) for each map, expressing 
your answer in terms of basis vectors: 


@| 5 a] wf 3) else a 
2-8 -2 5 -—-2 -l 

13 0 -2 

(d) —4 16 1 (e) —4 2 l of | 

È z È Ti Say 


451 


2. For each of the maps from problem 1, verify that Theorem 9.2.1 
holds. 


3. Classify each map from problem 1 as one-to-one, onto, or 
bijective; if the map does not satisfy any of the given properties, 
then state so. 


4. Compute both the Im(7) and Ker(7) for each of the following 
maps: 


(a) T((x,y)) = (x, —2, 2) 
(b) T((x,y)) = (y, £, y, 2) 
(c) T((x,y,2)) = (£ +y,y — 2,2 — y) 


(d) T((z,y,z)) =z+y+z 


5. Classify each map from problem 4 as one-to-one, onto, or 
bijective; if the map does not satisfy any of the given properties, 
then state so. 


6. In problems 1 and 4, find a basis of Ker(7)tand show that 
dim (Ker(T)*) = dim(Im(T)) 
7. Prove that a linear map T : R” > IR” is not one-to-one if m < n. 


8. Prove that a linear map T : R” > IR” is not onto if m > n. 


9. (See homework problem 11 of Section 9.1). Let T : R’— R” 
and S: R”— R* be two linear maps. Assume that their composite 
SO T: IR" Ris also a linear map. 

(a) Let S? T be one-to-one. Then must both S and T be 


one-to-one? If yes, explain why. If no, then give an example of 
why not. 


(b) Let S° T be onto. Then must both S and T be onto? If yes, 
explain why. If no, then give an example of why not. 


10. Let A be the n x n matrix representing the linear map T : R” — 
IR” through multiplication by A. 


452 


(a) Explain how the columns of the matrix A are found and 
what they are in the range R”. 


(b) Explain why det (A) + 0 if and only if T sends a basis to a 
basis. 


Mathematica Problems 


1. Each of the following matrices represents a linear map T : R” | 


R”. Compute both the Im(T) and Ker(T) for each map. Also, find a 
basis and the dimension for both Im(7) and Ker(7). 


j 1 -3 -5 

1 30 -2 a k F E 5 Ir 3 

(a) | -4 2 1 3 | (b) (c) | -1 E A 
0-21 4 b 

-2 1 4 a ee 3 3 -1 

-1 3 5 

1 -3 -5 

GC if 8 

1 30 3 > ie -1 7 2 

-4 21 ; ; 3 3 -1 

ai 5 _5 6 (e) i$ ys. 3 : Olay o s 

3 -2 1 Ta i 3 -1 8 3 

: 6 3 -1 

-1 -9 0 


2. Compute the rank of each of the following matrices to determine 
if the maps defined by them are one-to-one: 


1 i 3-33 Ls oe 
wofa >| w 5 0 3 ()| 5 0 3 
-4 -3 -5 AS -5 
$-22 S 3 -8 2 10 
9 -2 6 -14 0-20 4 
@)3 62 6| li 33 -7 
0-20 4 3-2 2 -2 


453 


piaga $3 1 
0 1 3 -4 0 
Qi fab E wh d 
=% 7 13 -0 2 
3 ae 3 


3. For the following matrix A, which represents a linear map T : ct 
sE by multiplication by A, find a basis and the dimension for 
both Im(7) and Ker(T): 


3-t Si 2+i 8 
—7+t 3 i —bi 


A= 1 i l+i 0 
i —1 0 2i 
S+t —i l i 


9.3 Composites of Two 
Linear Maps and Inverses 


Next, we will focus our attention on the composition of two linear maps, T 
: R” | R” and s : R” | R! denoted S9 T. By definition of the 
composition of functions, S° T IR’ R! where S° T (7) =S(T (z¢ 
)), for all T e R”. The composition So T is a linear map, since 


(SoT) (a +b7) = S(T (av + bv)) 
= S (aT (X) + bT (7)) 
=aS(T())+bS(T(X)) 


for all re Vv. Rand scalars a, b. 


454 


If the m x n matrix A represents T and the / x m matrix B represents S, 
then BA is the / x n matrix that represents the composite linear map SÈ T. 
In order to see this, let T T) =AT and s) =BY for P e R” ana Y 
e R”. Then 


(SoT)(%) = S(T (®)) 
= §(A7) 
= B(A?) 
= (BA) (#) 


Example 9.3.1. Let us do an example to see that this is correct. Let T : R: => R and 
S: R > R 3 be given, respectively, by 


T((x,y,2,w)) = (£ — y + z + w, 3r + 5y — 2w,x + 7z + 3w, 
x+y + 4z, 6y — 9w, x + 2z, -8r + y + 112) 


S((a,b,c,d,e, f,g)) = (—6a + 3c — 2d + f — 9g,a — 5b+c-d+e+f-g, 
7a + 3b — 4c + 5d — e — f + 89) 


We will have Mathematica verify that S o T can be represented as the 3x4 matrix C, with 


C=BA forBe R: x7 representing S, and A € Rr representing T. Pay special 
attention to how the matrix CMat is constructed, as the Transpose command must be 
used since the maps T and S are defined as row vectors using Mathematica. This same 
situation was encountered in the previous section. 


T[{a_, y-, z-, w-}] := {x-y +z +w, 32+5 y—2w, r+7z+3w, £+ 
y + 4z, 6y — 9w, x + 2z, —8x + y + 11 z}; 

S[{a_, b-, c-, d-, e-, f-, g-}] := {—6a + 3c — 2d + f — 99, a — 5b + 
e—d+e+f —g, Ta +3b-— 4c + 5d —e — f + 8g}; 

Expand[S[T[{x, y, z, w}]]] 


{3 w+68 x—5 y—90 z, 5 w—5 x-22 y-5 z, -2 w—48 x+15 y+85 z} 


(CMat = Transpose[{S[T[{1, 0, 0, 0}]], S[T[{0, 1, 0, 0}]], S[T[{0, 0, 1, 03]], SIT[{0, 0, 
0, 1}]]})) / MatrixForm 


455 


68 -5 -90 3 
-5 -2 -5 5 
-48 15 85 -2 


So we first calculated CMat using Mathematica to compute the composition of functions 
directly. We now construct CMat through matrix multiplication and show that indeed 


CMat = BA: 
(A = Transpose[{T[{I, 0, 0, 0}], T[{0, 1, 0, 0}], T[{0, 0, 1, 0}], T[{0,0, 0, 1}1})) // 
MatrixForm 

1 -1 1 l 

3 5 0 -2 

l 0O fF 3 

1 1 4 90 

0 6 0 -9 

1 09 2 Q 

-8 1 11 0 


(B = Transpose[{S[{I, 0, 0, 0, 0, 0, 03], S[{0, 1, 0, 0, 0, 0, 031], S[{0,0, 1, 0, 0, 0, 031, S[{0, 
0, 0, 1, 0, 0, 03], S[{O, 0, 0, 0, 1, 0, OF], SI{O, 


-6 0 3 -2 QO 1 -9 
i -5 1 -1 1 l -1 


B. A // MatrixForm 


68 -5 -90 3 
-5 -2 -5 5 


-48 15 85 -2 


Once the concept of compositions of linear maps has been introduced, it is 
only natural to consider inverse maps. We begin with the definition of a 
function being invertible. 


456 


Definition 9.3.1. A function F : D — R is said to be invertible, or have an 
inverse function, if there exists a function G : R — D such that both FO 
G: R > R and (G o F): D — D are the identity functions, Zp and Ip, on R 
and D, respectively. 


This means that both FO G (r) = r and (GÈ F) (d) = d, for all r € R and d 
e D. G and F are said to be inverse functions of each other, and both are 
said to be invertible functions. 


Here are a few examples of functions, and their inverses, along with 
corresponding domains and ranges: 


{ F(x) = z?, F : (—c0, 0] > [0, 00) 

Gly) = -Vi, G : [0, 00) + (—00, 0] 
F(z) = e*, F : R —> (0,00) 
G(y) = Iny, G : (0,0%) > R 


F(x) = tan(z), 
G(y) = arctan(y), 


Q y 
5 

L ni 
mo 
Nix 
vols 


For each of these examples, with the specified domains, we can show that 


(GÈ F) (x) =x and (F? G) (y) =y, for all x and y in the domain and range 
of F, respectively. We now connect the concept of invertibility to the 
concepts of one-to-one and onto. 


Theorem 9.3.1. 4 function F : D —> R has an inverse function G : R > D, 
if and only if F (as well as G) is both one-to-one and onto, that is, F is 
bijective or a one-to-one correspondence. 


Proof. To prove this theorem, we consider the following argument: Let F 
and G be an inverse function pair. Then G° F = Jp, and for d + c two 
distinct elements of D, we have (G? F)(d) = G(F(d)) = d, while (G° F) 
(c) = G(F(c)) = c. So F(d)4 F(c), and therefore we can conclude that F is 
one-to-one. 


Now, let r ? R, then F(G(r)) = (FÈ? G) (r) =Ir(r) = t, and so F is onto. If 
we now switch and let F be both one-to-one and onto, then for any r € R 
we define G(r) = d in D, where F(d) = r. We know that d exists and is 


457 


unique since F is one-to-one and onto. Therefore both F O G= Ir and G 
° F= Ip, giving G as the inverse to F. 


From the previous discussion, we know that a linear map T : R” = R” 
has an inverse function S : R” > R”, if, and only if, n = m and dim 
(Ker(7)) = 0, or dim (Im(7)) = n. Equivalently, a linear map T : R’—R 
™ with the m x n matrix A representing it, has an inverse linear map S : R 
TE xs R”, if, and only if, ae exists, in which case the matrix representing 
the inverse linear map S is A?! To see this, we would require that S? 7( 

) = but this is represented in matrix form as Ala®, which is, of course, 
just T. The same argument holds for 7° s) = Y), since AAT) = v 
). 


One very useful way to uniquely define a linear map T : R” > R” is by 


taking a basis B = { Vi, Ta... Vv, of its domain R”, and then 
providing the values 


T(B) e {T(T]) ,T (v3 ), a eters T(un)} 


The question then arises as to what we know about the linear map T if we 
know something about the set 7(B). Theorem 9.3.2 is one possible answer 
to this question. 


Theorem 9.3.2. 4 linear map T R” — IR" is invertible, if, and only if 
for any basis B of IR", we have that T(B) is also a basis of R” 


Example 9.3.2. Now, let us do an example of finding the matrix A representing an 
invertible linear map T, where T is defined by sending a basis By to a basis By. We also 
will show that A has an inverse matrix, which corresponds to the inverse map S. Let the 


linear map T : R: -R 4 be defined by 


T((-1, 3, 1,5)) = (2,—4, 1,0) 
T((6, —2,0, —4)) = (1,3, —7, 2) 
T((9, 11, —2,8)) = (—5,6, —2, 1) 
T((—4, —9,3,1)) = (0,7, —2, —10) 


458 


From Section 9.1, we learned that the matrix representing A is given by A = WV" L where 
V is the matrix whose columns are the elements of the basis By and W is the matrix 
whose columns are the elements of the basis Bw 


V1 = {-1, 3, 1, 5}; v2= {6, -2, 0, -4}; 
v3 = {9, 11, -2,8}; v4= {-4, -9, 3, 1}; 
(Vrow = {v1, V2, V3, V4}) // MatrixForm 


-1 3 l 5 
6 -2 0 —4 
9 11 -2 8 
-4 -9 3 l 


RowReduce[Vrow] // MatrixForm 


occ = 
O O m © 
cor O © 
= © © 


W1 = {2, -4, 1, 0}; w2 = {1, 3,-7,2}5 
w3 = {-5, 6,-2, 1}; w4 = {0,7,-2, -10}; 
(Wrow = {w1, W2, w3, W4}) // MatrixForm 


RowReduce[Wrow] // MatrixForm 


O oom 
O O- oO 
ornoo 
~ Ooo & 


459 


(A = Transpose [Wrow].Inverse [Transpose [Vrow]]) // MatrixForm 


_ 25 41 1019  _ 305 
132 3 264 264 
389 _ 271 _ 4171 1657 
396 99 792 792 
sE E 415 6 
132 33 264 264 
_13 179 4&5 37 
99 9 198 198 


Now that we have A, we check to see if it is invertible and that it satisfies the four vector 
equations used to define T: 


Det [A] 
161 
264 


{A.v1, A.V2, A.V3, A.V4} // MatrixForm 


2 —4 #1 0 
l 3 -7 2 
-5 6 -2 1 
0 7 -2 —10 


As one can see, the vectors given as rows in this last matrix do agree with the columns of 
the following matrix: 


Wrow // MatrixForm 


Homework Problems 


1. For each of the following pairs of maps, determine which order, 
if possible, S and T can be composed in: 


460 


(a) T : R? > RË (b) T : R? > RË (T:R >R 
S: R? — R? SRE S:R=> R? 


(d) T : R > R? (e) T:R > R? (f) T : R? > R? 
S:R > R? S:R'oR S: R? > R? 


2. For each of the following pairs of maps, compute S° T without 
using matrices: 


(a) T((z,y))=a2+y (b) T (x) = (x, —2, 2) 
S (a) = (a, —a) S((a,b,c)) = {a+b,a—c) 
(c) T((x,y))=(-z.y,2+y) (d) T (x) = (32, 2x, —4x) 
S ((a,b,c)) = (—a, b} S((a,b,c)) =a+b+e 


3. The compositions from problem 2 should have yielded two such 
that SO T(V) = Y, for all T in the domain of T. Which two 
are they? 

4. The two compositions found in problem 3 highlight an important 
sticking point in Definition 9.3.1 and Theorem 9.3.1. Verify that T 
os V7) +4 WY, for all T in the domain of S. Why is this the 
case? 

5. Determine the linear map given by each of the following 
matrices: 


-1 0 l 
wj 0 1 of] 
0 0 
6. Four of the matrices from problem 5 correspond to linear maps, S 


or T, from problem 2. Find them and state to which maps they 
correspond. 


461 


7. If the matrix C, which represents a composite of two linear maps 
through multiplication by C, has a nonzero determinant, then must 
the same be true for each of the two matrices A and B, which 
represent the individual linear maps? If yes, explain why. If not, 
give an example to verify your answer. 


8. Let T: R” IR" be a linear map that is invertible as a function. 


Show directly that its inverse function, 7.1: R” — R” is also a 
linear map. 


Mathematica Problems 


1. Compute the rank of each of the following matrices to determine 
whether the corresponding linear map has an inverse: 


E oe 
(a) E A (b) | 2 3 1 
451 
i Sr a 1. oe og 
g Set os a an ae 
()} 9 -3 1 ~2 d] gah ee l 
t Ses aD 0 Gd 
=$ 5 5 7 i aa a 
=| 0 6 =8 S & =< & 
(e) 4-4 -4 1 (f) ¢ =10 4f = 
0 1 0 0 -13 14 -15 16 


2. For each of the linear maps found to be invertible from problem 
1, compute the inverse map. 

3. Graph each of the following pairs of functions on an appropriate 
domain, along with the graph of y = x. As discussed in this section, 
these pairs of functions are inverses of each other on the specified 


domain. Explain geometrically the relationship between the 
functions and the line y = x. 


j ae “2. 0 es EA () te = tan(2) 


G(x) =arctan(x) 


462 


4. A curve y = f(x) can be regarded as the collection of points (x, 
Fix) for all x in the domain of f. Using this information, can you 
geometrically construct the inverse to a bijective function y = f(x) 
through matrix multiplication? (Hint: There are two reasonable 
approaches; however, if you consider the curve to lie in the xy-plane 
in R? , then you can simply rotate the curve about a line that passes 
through the origin in R? by a fixed angle.) 

5. For the matrix A below that represents a linear map 
T : Ct + C! by multiplication by A, find the matrix B that 
represents its inverse map. Also check, on the level of the respective 
two linear maps, that these are inverse functions. 


3-t i 2+i 8 
-7+i 3 i —6i 

l i l+i 0 
i -i 0 2i 


A= 


9.4 Change of Bases for the 
Matrix Representation of a 
Linear Map 


A linear map T : R” | R” is usually represented by the m x n matrix A 
where the standard bases Sn of R” and Sm of IR” are used to write out 


the column vectors of IR” and R”, respectively. This means that y 


)= AÈ, where the column vectors Z of IR”, and of R”, are written 
out as linear combinations in the standard bases; that is, if Zz = 


(X1,X2,...,.Xn) as a row, then 


¥ =2,(1,0,..., 0) + r2(0,1,0,...,0) +---+2,(0,0,...,0, 1) 


463 


What if we want to apply linear maps, not using the standard bases Sn of 
IR” and Sm of IR”, but instead with a pair of different bases Bn = { Y 1, 

Ws, Vv} of R” and Bm = RiB.. of R”? What we seek 
is an m X n matrix A’ where v= T T) = 4°# for column vectors Z of 


R” and y of R” written out as linear combinations in the new bases. 


Before we can fully understand what this means, we will need to introduce 
some notation. When we express a vector T as a linear combination of 


Va... 0 


vectors in a basis B = { Vi, ’9,..., Un}, we mean that 


(9.21) È x1 Vv, + x2 Vo + Xn T, 


If we are dealing with more than one basis, we now have a way to 
recognize the different potential representations of a vector using different 
bases. For instance, as long as we have n linearly independent vectors C = 
cae oo ; Qi} in IR”, we know that they form a basis for IR”. Hence, 
any vector % can be written as a linear combination of vectors in the basis 


C. So, for an arbitrary T, we obtain 


(9.22) B= c1 Pite E. tonn 


for some scalars cı through cn. It should be fairly obvious that the 
coefficients in the expansions of a vector using two different bases will 


not be the same. Clearly, if Zk £ v j for all J <j, k <n when comparing 
two bases, then one may assume that most likely, ck # xx for all k as well. 
But we know that 


(9.23) x1 v + x2 Der LE Xn Veur H oe 24 ec om: 


Thus, to avoid confusion, if we are not using the standard basis Sn, we will 
place the name of the basis as a subscript on our vectors to denote which 
basis the vectors have been expanded in. For instance, equation 9.22 can 
be expressed in the following notation: 


-> > > 
Ze = C1 2] + C222 + *** Cnn = (ec), C2, wats (nde 
(9.24) 


464 


If there is no subscript on the vector, we will assume that the vector is 
represented as a linear combination of the vectors in the standard basis. 


Example 9.4.1. To clarify this idea, we now consider an example involving the following 


basis for R:. 


C= {7 = (-1,5,2), Z = (3,-4,7), Z = (8,1,-9)} 


It is easy to show that 


(10,2,0) = 7+ 2 + 3 = (1,1,1)c 


We now turn our attention back to finding the matrix A’ If we have the 
following 


7B, = (Sp, agend On) Bas Yen = = W Yo,+++> Um) By, 


then we now know, respectively, that 


TB, x1 Tı F X2 Vo + Xn T, TBn ywi + yr a tyne 


n 


Since T is a linear map, we have that 


T(?eB, )= = zıT (ïi ) T rT (v2) gente 3 TnT (Un) 
(9.25) 


Furthermore, since 7( v € R”, for 1 <j < n, we can express each 
vector as a linear combination of the basis vectors in Bm 


ae T (Uj) = c1 jD + 2,;2 +--+ + Cm jim 


for each 1 <j <n, and scalar Cy; terms with 1 <j <n and 1 < k < m. The 
claim now is that A’; = Ci, where the Cj, terms are defined as in (9.26). 
To see this, we start at the beginning: 


465 


n 
d 5 an s1 TiCk 5 
In the last step of this chain of equalities, we set ak = L j=1 TjCkj to 
make it more clear that we have a linear combination of basis elements of 
Bm. We therefore now have the following: 


466 


li 
w 
il 

A 


j=1 By, 


Cm,1 Cm2 *** Cmn 


T(?eB,) 


(9.27) 


Now, unfortunately this tells us what form the matrix A’ takes, but we still 
do not know the specific values of Cj. To do this, refer to equation (9.26). 
Notice that for all 1 < k <n, we have 


= — roma ——> 
T (Vk) = C1, KWI + C2, kW +- + Cm kWm 
Ci,k 
—>|—> — ©2,k 
= | wi | We Wm 
Cm,k 


(9.28) 


Applying this to all the vectors allows us to construct the following matrix 
equation: 


467 


(9.29) 


T (vt) |T (v2) |---| T (0) | = | wi| wa +++ | me | A’ 


This says that the matrix A’ = wT (V), where W is the matrix whose 
columns are the basis elements of Bm and T(V) is the matrix whose 
columns are results of applying T to the basis elements of Bn. We also 
have A’ = w lav, where V is the matrix whose columns are the elements 
of the basis By since the matrix T(V) = AV for A the matrix representing T 
in the standard bases. 


Example 9.4.2. Of course, now we must do an example to verify our reasoning above. 


Let r: R? > IRS be the map given as follows: 
T((x, y, z)) = (5a — 3y + 2z,2 — y — z,-7x + z, 6y — 11z, 82% + y — 52) 


whose corresponding matrix form is given by 


5 -3 2 
l -l —1 z 

A¥V=|-7 0 1 y 
0 6 -1i z 
$ 1 -$ 


In our case, we wish to find the 5x3 matrix A’ representing T in the two bases 
= =b « rs 4 
B; = {vj = (-1,5,2), ¥ = (3, —4, 7), 73 = (8, 1, —9)} 


of IR? and 


Bs = {wi = (1, —1,3, —2, 5), W = (—4, 2,0, 1, —3), I} = (-2, 1,5, -7, 2), 
wi = (0,9, 1, —2, 1), ws = (6,8, 1, -1, —3)} 


of R 5, Our matrix 4’ must satisfy 


468 


1 0 0 

TA | 0], TA] 1], TRA |0 
0 0 1 

— 


— 


where = denotes when each T ( Jj ) is written as a linear combination of the wj 


vectors from the basis B5. Then 
T[{z-, y-, z-}] := {5x — 3y + 2z, £ — y — z, —Tx + z, 6y — 11z, 8z + 
y — 5z}; 
vu = {-1, 5, 2}; v2 = {3, —4, 7}; V3 = {8, 1, —9}; 
T[vı] 

{-16, —8, 9, 8, —13} 


wi = {1, —1, 3, —2, 5}; w2 = {-4, 2, 0, 1, —3}; w3 = {-2, 1, 5, —7, 2}; w4 = {0, 9, 1, -2, 
13; ws = {6, 8, 1, —1, -3}; 


(W = Transpose [{w1, w2, w3, w4, W5}]) // MatrixForm 


l -4 -2 0 6 
-] 2 19 B 
3 0 5 1 1 
-2 1 -7 -2 -1 
5 -3 2 1 -3 


(TV = Transpose[{T[v1], T[v2], T[v3]}]) // MatrixForm 


-16 4l 19 
-9 0 16 
9 -14 -65 
8 -101 105 

-13 -15 110 


(APrime = Inverse[W].TV) // MatrixForm 


469 


11217 46945 43531 


2608 1304 2608 
42179 __ 93259 __ 34815 
5216 2608 5216 
_ 2031 48327 _ 126789 
5216 2608 5216 
_19997 16117 97185 
5216 2608 5216 
4897 _ 6297  _ 31733 
2608 1304 2608 


(C1 = APrime.{{1}, {0}, {0}}) // MatrixForm 


11217 
2608 
42179 
5216 
_ 2031 
5216 

_ 19997 
5216 
4897 
2608 


C1L[LM] witC1 [2,1] w2+C1[[3,1]] w3+C1[[4,1]] w4+C1[[5,1]] ws 
{-16, —8, 9, 8, -13} 

T[V1] 

{-16, -8, 9, 8, -13} 

(C2 = APrime.{{0}, {1}, {0}}) // MatrixForm 


_ 46945 
1304 

_ 93259 
2608 
48327 
2608 
16117 
2608 
_ 6297 
1304 


C2[[1,1]] WrtC2[[2,1]] w2+C2[[3,1]] w3+C2[[4,1]] w4+C2[[5,1]] ws 
{41, 0, -14, -101, -15} 

T[v2] 

{41, 0, -14, -101, -15} 


470 


(C3 = APrime.{{0}, {0}, {1}}) // MatrixForm 


43531 
2608 
_ 34815 
5216 
_ 126789 
5216 
97185 
5216 
_ 31733 
2608 


C3 [LU] witC3[ [2,1] w2+C3[[3,1]] w3+C3[[4,1]] w4+C3[[5,1]] ws 
{19, 16, -65, 105, 110} 

T[v3] 

{19, 16, -65, 105, 110} 


So A'= WT (V), where W is the matrix whose columns are the basis elements of Bs and 
T(V) is the matrix whose columns are T applied to the basis elements of B3. 


Now, we also can write the matrix A’ in terms of the matrix A that represents the linear 
map T in the standard bases and the two matrices whose columns consist of the new basis 
elements. We have A' = W!A V, where V is the matrix whose columns are the elements of 
the basis B3 since the matrix T(V) = AV: 


(A = Transpose[{T[{I, 0, 0}], T[{0, 1, 0}], T[{0, 0, 1}]}]) // MatrixForm 


5 -3 2 
1 -1 -1 
-7 0 1 
0 6 -ìl 
8 ] -5 


(V = Transpose[{v1, v2, v3}]) // MatrixForm 


A.V // MatrixForm 


471 


-16 Al 19 


-8 0 16 
9 -l4 -65 
8 —101 105 

-13 -15 110 


(TV - A.V) // MatrixForm 


0 0 


coco 
coc co ©& 


11217 _ 46945 43531 
2608 1304 2608 
42179 93259 _ 34815 
5216 2608 5216 
2031 48327 _ 126789 
5216 2608 5216 
19997 16117 97185 
5216 2608 5216 
4897  _ 6297 31733 
2608 1304 2608 


(NewAPrime — APrime) // MatrixForm 


cooeoecco 
cooeococo 
coocco 


472 


Clearly, this method works. Next, we will try to illustrate the derivation of 
the formula 4’ = WAV in a much simpler way. Mathematicians have 
come up with a very nice tool to help visualize this derivation process. 
The following is an example of a commutative diagram: 


We will now attempt to interpret this diagram. Remember that the matrix 
A’ corresponds to the linear map T with the property that 7( (TBn) = Bm, 
and uses the basis By for IR” and Bm for IR”. Starting in the upper-left 
corner of this diagram at Tp, and following the arrow to the right, which 
means applying 4’, we end up with Ys; hence Ap, = Te. This in 
itself does us no good; however, notice that instead of following the arrow 
to the right, we could have just as easily applied the matrix V to Te, and 


ended up with the vector T in the lower-left corner of the diagram. To see 
why this is true, remember that V was the matrix whose columns were the 
elements of the basis Bn, which yields the expansion 


n 
a . T 
ZB, = > 255 =V. [ oy 2 * By 
=1 
(9.30) : 


which is then interpreted in the standard basis Sy, as 


j= j=! k=l 
n n 
> X TjVk j | ĉj 
k=l \j=1 


(9.31) 


473 


Now that we have the vector T after applying V by left multiplication to 
Te, we simply left-multiply by 4, which is the map matrix under the 
standard basis. Thus, we are now in the lower-right corner of the diagram. 
At this point, we know that AVTB, = 7y. In an argument similar to that 
for the case of V, we can go from v to ZB, by left multiplication of wi, 
where W e IR” *”™ is the matrix whose columns are the basis vectors of 
Bn. Hence, we are now in the upper-right corner of the diagram. As a 
result, we see that A’ = WLAT for all vectors expressed in terms of the 
basis Bn. The following is the commutation diagram expressed in terms of 
vector spaces and respective bases: 


A’ 


R” with basis B, ———> R™ with basis Bm 


v| Jw 


> s 4 j , 
R” with basis S,, ——» R™ with basis Sm 


The power of a commutative diagram lies in the fact that it allows you to 
perform function mappings that you do not know, in terms of functions 
that you do know. In our case, we do not know A’ however, we do know V 
and W, and can readily determine A. Therefore, it is much easier to use 
mappings that we already know and express our unknown map 4’ in terms 
of them. In this case, we end up with A’ = WAY. 


Homework Problems 


1. Represent each vector of the standard basis S2 of IR? as a linear 
combination of vectors in the basis B = {(1,1),(1,—1)} . Also, write 


your answer in vector form, using the correct notation. 


2. Express each of the following vectors in R? in vector form using 
the basis B from problem 1: 


(a) (2,0) (b) (2,-3) (c) (9-1) (d) (-2,0) (e) (-4.3) (£) (7,-6) 


3. Verify that the following equation has only the trivial solution: 


474 


ae; + be = a(1, 1) + (1, —1) 


Here, a and E2 are the standard basis vectors for R?. What 


does this imply in regard to representation of vectors with the 
standard basis and the basis B from problem 1? 


4. Represent each vector of the standard basis S3 of R? as a linear 
combination of vectors in the basis B = {(1,0,1), (0, 1, 1}, (0, —1,0)} 
. Also, write your answer in vector form, using the correct notation. 


5. Express each of the following vectors in R? in vector form using 
the basis B from problem 4: 


(a) (1,1,0) (b) (2,2, —2) (c) (-5,6,6) (d) (2, —2,3) 
(e) (1,1,1) (f) (3, —4, —10) 


6. Construct a matrix A' that takes vectors in R? expressed in terms 
of the basis Bı = {(1,1),(1,—1)} and expresses them in terms of 


the basis B2 = {(2, —1), (3, 2} }, i.e., A'? p, = ZB 

7. Construct a matrix 4’ that takes vectors in IR? expressed in terms 
of the basis Bı = {(1,0,1), (0,1,1),(0,—1,0)} , and expresses them 
in terms of the basis B2 = {(1,1,—1), (1, —1, 1), (1, —1, —1}}.. 

8. Given the map T : R? > R? defined by 
T((x,y)) = (0,y,ax) find the matrix A’ corresponding to T 
under the two bases B2 = {(1,1),(1,—1)} and B3 = 
{(0, 1,1), (1,0,0), (0, 1,0)}. 

9. Construct the commutation diagram for the map from problem 8. 
10. Given the map S : R? = R4 defined by 
S((x,y,z)) = (0, z,y, x), find the matrix A’ corresponding to S$ 


under the following two bases: 


Bs = {(0, 1, 1), (1,0,0), (0, 1,0)} 
B4 = {(0,1, 1,1), (1,0,0, 1), (0,1, 1,0), (0,0, 1, 1)} 


11. Construct the commutation diagram for the map from problem 
10. 


475 


12. Construct the commutation diagram for So T, where T and S 
are the maps from problems 8 and 10, respectively. Use the diagram 
to find the matrix corresponding to the map So T 


13. Consider the case of a linear map whose domain is represented 
by a nonstandard basis By, and whose image is also represented by 
a non-standard basis Bm. Hence, we already have 
i ba (?B.) = TBn How can you recover the original maps’s 


matrix A in the standard bases, given the matrix A’ that represents T' 
for the pair of nonstandard bases? (Hint: Drawing a commutation 
diagram can help.) 


14. In problems 6 and 7, verify directly that (Aay! reverses the 
order of the two bases. 

15. Let 7: R” = R” be a linear map where we have two 
(different) pairs of bases, bases B and C for IR” and bases D and E 
for IR”. Let TS be the matrix that represents the linear map T in 
the two bases B on the domain R” and D on the range R”. 
Similarly, let TE be the matrix that represents the linear map T in 
the two bases C on the domain R” and E on the range R”. Also, 
for J the identity linear map from R” to R”, we have the matrix 
I 8 that represents the linear map Z in the two bases B and C while 
similarly, the matrix IÈ represents the identity linear map Z from 


R” to IR” in the two bases D and E. 


(a) Explain the meaning and discuss the validity, of the 
following commutative diagram: 


Tp 
n m 
R > RS 
18 | i: 
TE 
n E m 
RG > RE 


(b) Is the matrix equation TP =f ETE I R correct? Explain 
your reasoning. 

(c) Verify with an example the matrix equation in part (b) when 
only one pair B, D of bases are the standard bases of R?. 


476 


(d) Verify with an example the matrix equation in part (b) when 
all four bases are not the standard bases of IR?. 
16. (Continuation of problem 15.) Let J be the identity linear map 
from R” to R” and B, C be two bases of R”. 


a) Explain why IB = J& =] , where In is the n x n 
B Cc n 


identity matrix. 

i |: er fe; 
(b) Explain why I5 = (Is ) 
(c) Let T: R” — R” bea linear map. Explain why 


TS = IŞTSIE 


-l 


is correct, and thus 
B _ 7Cpo/;c\7! 
Tp zz Ip Ic (Jp) 


This last equation states that the two n x n matrices 
jen and TS are similar matrices. 

(d) Explain why (T)o - (Tg)7". 

(e) Why is det (TE) = det (TB)? 


(£) Let n = 2, and do examples to illustrate parts (a)-(e) above. 


Mathematica Problems 


1. (a) Represent each vector of the first basis B as a linear 
combination of vectors from the second basis C, and vice versa: 


B = {(1,1,—1), (1, —1, 1), (1, -1, -1)} 
C = {(0,1,—1), (1,0, 1), (1, —1,0)} 


(b) Construct the matrix A’ that takes vectors in R? expressed 
in terms of the basis B, and expresses them in terms of the basis 
C, where the two bases are defined as in the previous problem. 


477 


As in the “homework problems” section, A’ should satisfy 
A’ TB = Pce Now do this in reverse and relate the two 


matrices A’ that result. 


2. (a) Represent each vector of the first basis B as a linear 
combination of vectors from the second basis C, and vice versa. 


B = {(1, 1, —1,0), (0, 1, —1, 1}, (1,0, —1, —1), (1,0, 1,0}} 
C = {(1,0,1,1), (0, 1, 1, 1), (0,0, 1,0), (—1, —1, 1,0)} 


(b) Construct the matrix A’ that takes vectors in R4 expressed 
in terms of the basis B, and expresses them in terms of the basis 
C, where the two bases are defined as in the previous problem. 
Now do this in reverse and relate the two matrices A’ that 
result. 


3. Consider the maps S : R! R5, T: R5 > R? ana u: R? > 
R4 defined by 


S((w,z,y,z)) = (w + x,£ +y, y +z,z+w,w +y) 

T((a,b,c,d,e)) = (a +b- c+d-e,a+c+e) 

U((j, k)) = (jj + k,j Ez k, k) 
and bases B2, B4 and B5, given by 

Bz = {(1,1),(-1,1,)} 

B4 = {(1, 1, —1,0), (0, 1, —1, 1), (1,0, —1, —1), (1,0, 1,0)} 

Bs = {(1, 1,0,0,0), (0, 1, 1,0,0}, (1,0, 1, 1,0}, (0, 0, 1, 1,0}, (0, 1,0, 1, 1)} 
Compute the matrices corresponding to the following maps with the 
given bases for domains and ranges: 

(a) S with B4 and B5 

(b) T with B5 and B2 

(c) U with B2 and B4 

(d) 7o S with B4 and B2 

(e) Uo T with B5 and B4 

(£) So U with B2 and Bs 

(g) Uo To S with B4 and B4 


478 


(h) To So U with B2 and B2 
(i) So Uo T with Bs and Bs 

4. Use homework problems 15 and 16 to 
(a) Find the following matrices: 


(i) IB (ii) I8 (iii) IS (iv) IB 
(v) TB (vi) TP (vii) TP (viii) TP 
(ix) (T)2 (x) (T")B (xi), (TB. (iit), (T")B 
(xiii) TE (xiv) TE (xv) TE (xvi) TE 


(xvii) (T—")E (xvii) (TẸ (xix) (TE (xx) (TE 


for the linear map T : R?— R? given by 


T((x, y, z)) = (7x — y + 2z, —62 + 3y — 5z,2 + y + 42) 


in the four bases: 


B = {(—2,1,4), (5, —9,6), (7,3, —11)} 
C = {(-17,4,8), (10, —5, 13), (15,6, -7)} 
D = {(-13, 2, —10), (1, 2,9), (5, -1,3)} 
E = {(—5, —4, -1), (2,7, 4), (7, -3, 6)} 


(b) Find the determinants of all of these matrices. 


(c) Explain all possible relationships between these matrices 
and their determinants. 


5. Use homework problems 15 and 16 to 
(a) Find the following matrices: 


479 


i) 8B (i) IR (iii) Ig (iv) JB 


(v) T% (vi) TP (vii) TB (viii) TR 
x) (T")P œ (T œ) (T8 (xi) (TB 
(xiii) TẸ (xiv) TE (xv) TS (xvi) TF 

E Cc E 


(xvii) (T-")E (xvii) (TYE (xix) (TG x) (TYE 


for the linear map T : C? > C? given by 
T((x, y)) = (ix — y, 6x + (2 + 5i)y) 
in the four bases: 
B = {(i,-i), (1+i,1)} 
C= {(5 -i,i), (2+i,-i)} 


D = {(1,i), (i,-1)} 
E = {(i, -3), (-1, —i)} 
(b) Find the determinants of all of these matrices. 


(c) Explain all possible relationships between these matrices 
and their determinants. 


480 


Chapter 10 


The Geometry of Linear and 
Affine Maps 


10.1 The Effect of a Linear 
Map on Area and Arclength 
in Two Dimensions 


As we discussed in Chapter 9, a linear map T can be represented by matrix 
multiplication. In particular, T: R? — R? can be represented by a 2 x 2 
matrix A. Now we will investigate how such functions transform standard 
geometric objects, such as line segments, types of quadrilaterals, 


polygons, and circles, as well as general simple closed curves in R? In 
addition, we want to see how the determinant of the matrix A affects the 
geometric properties of objects transformed under the linear map 7. 


We shall begin with a line segment S joining two points, expressed in 


vector form as P(t, yı) and & (32: Y2). All the points of this line 
segment can be parametrically written as 


S =tP+(1-0d 


= (tx; + (1 —t)x2, ty, + (1 — t)y2) 


481 


for the parameter t € [0,1]. The point Q corresponds to t = 0 and P 
corresponds to ¢ = 1. Plugging this equation for the line segment into the 
linear map 7, we have 


= A(tP + (1-t)@) 
=tAP+(1 ~1t)AQ 
=tT(P) +(1—t)T(@) 


which is the line segment joining the point 7( P) to the point 7( Q), 


A linear map t: R? — R?is uniquely determined by sending a line 


segment Lı = @1 41), (x2, y2) in its domain, where Lı does not 
determine a line through the origin, to another line segment L2 = 


{(21, wi), (22, w2) } in its range. This means that 


(æj y) = (z5 w3) , for j = 1,2. Equivalently, if we 


set 
A= 


then 


a b Tj Zj 
c d Yj Wj 


for j = 1,2 and the four unknowns a, b, c, and d. This matrix equation 
gives us the two pairs of linear systems: 


482 


azj +byj = zj, forj=1,2 


cx; + dy; = wj, for j = 1,2 
(10.1) 


The first equation in system (10.1) gives rise to the matrix system 


Pasir 


and similarly for the second equation in system (10.1), we get 


a welle || a 


We can combine the two matrix equations above into the single equation: 


Ti yi a c — 1 Wy 
T2 Y2 b d 2 W2 
(10.2) 
We can now solve this for 4, although we have to be careful, since it is not 


A that appears in the formula above, but AT. So equation (10.2) can be 
expressed in terms of A, L1 and L2 as 


w 


First, we get 


AT = (L) Ly 


483 


and then we transpose both sides to get 
ar T m T 
A= (Lī L2) =L; (Lī) 


Here we utilized the fact that (BC)! = CTB" for any matrices B and C of 
the correct dimensions. Now, note that we could have started solving 
equation (10.3) for A by first taking a transpose. This would have given 


sa T 
ALT =L} 
Then solving for A gives 
Pp pre 
A=Ly (Li ) 


e i O T\~1 

which shows that (2i ) S (Li ) , thus reminding us that the 
transpose of the inverse of a matrix is equivalent to the inverse of the 
transpose of a matrix. Now, to solve for A, we prefer to use the second 
expression, which in full matrix form is 


-1 
gz | 2) 22 E T2 


wi We yı Y2 


Note that the matrix inverse on the right in the formula above exists if and 
only if the line segment L1 does not determine a line through the origin. 


Example 10.1.1. Now, let us see if this really works by finding the matrix A that 
((5,.13}) = {—2, —9) 


represents the linear map T, where 7 ' 


((=7,3)) = (4, =1) 


and T 


P= {5, 13}; Q = {-7, 3}; 
(L1 = Transpose[{P, Q}]) // MatrixForm 


484 


= 


5 -7 
is 3 


TP = {-2, -9}; TQ = {4, -1}; 
(L2 = Transpose[{TP, TQ}]) // MatrixForm 


(A = L2.Inverse[L]]) // MatrixForm 


— 29 3 
53 53 
Se 
53 53 
{A.P, A.Q} 


{1-2,-9},{4.-1}} 


Example 10.1.2. In the first example, we computed the matrix A corresponding to a 


given map of two points in R: For our next example, we will instead define the matrix 
Aas 


-2 7 
A= 5 3 


9 


and explore the geometric ramifications of this map to lines and squares in R: First, we 


( —4,- 9) Q 
start with lines, and then plot the line segment joining P to 


(8, 3) along with its transformed version, as depicted in Figure 10.1. 
A= { (2, 7}, {5, 3} }; 

P= (4, -9}; Q = {8, 3}; 

Norm[P - Q] 


„V2 


TP = A.P 
{-55, -47} 


485 


TQ=A.Q 
{5,49} 
Norm [TP - TQ] 


Det[A] 

-41 

N[Norm[TP — TQ]/Norm[P — Q]] 

N[Sqrt[Abs[Det[A]]]] 

6.67083 

6.40312 

LinePlots = Graphics[{Thickness[.008], Blue, Line [{P, Q}], Red, Line [{TP, TQ}]}]; 


TxtPlot = Graphics[{Black, Text[“P”, {-4, —11}], Text[“Q”, {8, 5}], Text [“TP”, 
{-55, -49}], Text[“TQ”, {5, 51}]}]5 


Show[LinePlots, TxtPlot, Axes—True] 


Figure 10.1: The line segment P Q is transformed to 7( P) T™ 


on 


486 


Note that the length of the line segment from 7( P, to 7( Q, is approximately 


yidet{A)] 


times the length of the line segment from P to 


(10.4) DIT(P), T(Q)) ~ V/\det(A)|D(P, Q) 


N[Norm[TP — TQ]] 

113.208 

N[Sqrt[Abs[Det[A]]] Norm[P — Q]] 
108.665 


Is this always approximately true, or not? We will attempt to answer this 
shortly. Let us next move on to the simplest geometric figure with an area: 
the unit square. The unit square has vertices at the points (0, 0) (1,0). 
(1,1) (0, I), as we move around its perimeter in a counterclockwise 


fashion. Every point of the square, and its interior, can be written as (s,¢) 
= 5(1,0) + 1951) for 0 < s,t < 1. This should be immediately obvious, 


since the vectors (0, 1) ; (1,0), correspond to the standard basis for R?, 
and if 0 < s, £ < 1, then the resulting point will always have coordinates 
between 0 and 1 in value. If we wish to apply a linear map T to an 


arbitrary point (s,t) on, or in, the interior of the unit square, notice what 
happens when we use the linear properties of the map T: 


T((s,t)) =T (s(1,0) + ¢(0, 1)) 


= sT((1,0)) +¢7({0, 1)) 
(10.5) 


It appears that, given the location of the transformed points corresponding 


to (1,0) and (0, 1), we can determine the location of any other point on, 
and in, the square. In reality, we also know the locations of two other 


points as well. In particular, it is easy to show that T( (0, 0) = (0,9) and 7 ( 
(1, 1)) = r+ 0) + 7((9, 1), So the transform of our square is a 
parallelogram with corners (0, 0), 74150), r, 1), and rL +r 


487 


(0,1) ). Also, if we represent T by a 2 x 2 matrix A, then notice that 7( 
(0, 1) ) is the first column of A, while 7 (0, 1) is the second column of A. 
Example 10.1.3. Using the matrix A defined in the previous example, we will transform 
the unit square and its interior, as depicted in Figure 10.2. We also want to see if the areas 


of the unit square and the resulting parallelogram are related. Recall that the area of a 


parallelogram, where two adjacent sides are the vectors v and w is given by 


|? xE 


in R:. Since we are dealing with points in R:, we will tack on an extra coordinate to 


| *. Also, remember that the cross product can be applied only to vectors 


make them points in R;, where the third coordinate is always being 0. We can choose 
the last coordinate of each of the points to be any number we so choose, as long as it is 
consistently done for both points. Many formulas set the last coordinate to 1 instead of 0. 


UnitSquare = Graphics[{Blue,Polygon{[{{0,0},{0,1},{1,13,{1,0}3; 
Thx y= Afi}, Os 


TUnitSquare = Graphics [{Red, Polygon[{Flatten[T[{0, 0}]], Flatten[T[{0, 1}]], 
Flatten[T|{1, 1}]], Flatten[T[{1, 0}]]}]}]5 


TxtPlot2 = Graphics[{Black, Text[“T(1,0)”,{-2.5,5}],Text[“T(1,1)”, {5, 8.5}], 
Text[“T(0,1)”, {7.5, 333]; 


Show[TUnitSquare, UnitSquare, TxtPlot2, Axes— True] 


Figure 10.2: The unit square and its transform parallelogram. 


y 
T(1,1) 


-2 2 4 6 


T3DVector[{x-, y-}]:= Flatten [{A. {Lc}, fy}, OF; 
{T3DVector|{1, 0}],T3DVector|{0, 1}1} 


488 


{ {-2,5,0}, {7,3,0}} 

Norm[Cross[T3DVector[{0, 1}], T3DVector[{1, 0}]]] 
41 

Det[A] 

-41 


In this example, the linear map T has turned the unit square into a parallelogram with area 
precisely |det(A)| = 41. 


It appears at first glance that linear maps preserve the overall geometric 
properties of an object in the plane, with area changed by the 
multiplicative factor |det(A)|. Let us see if this is true for a generic 
parallelogram. We will define our parallelogram to have vertices { 
{(0,0), (a, b}, (a + c,b + d), (c,d)} }, and the matrix A to be 


a B 
a=|$ 5 | 


We will show, with the help of Mathematica, that the area of transformed 
parallelograms does satisfy the property seen in the previous example: 


V = {a, b}; w = {e, d}; 

A= {{a, P3, (0, 033; 

Det[A] 

—Bo+a0 

T3DVector[{x-, y-}]:= Flatten[{A.{{x}, {y}}, O}]; 
Norm|[Cross[Flatten|[{v, 0}], Flatten[{w, 0}]]] 
Abs[-b c+a d] 

Norm[Cross [T3DVector [v], T3DVector [w]]] 
Abs[b cf6 —a dBd-b cab+a d a 0} 

Abs [Factor [% [[1]]]] 

Abs[(b c-a d) ($ ô-a0)] 


489 


So the area of the transformed parallelogram using the linear map T is 


Transformed area = |(a@ — 8ô)(ad — bc)| 
= jad — 86| jad — be| 


= |det(A)| - (the area of the original parallelogram). 


Now we might ask if this is also the case for general simple polygons in 
the xy-plane. By use of the term simple, we mean that the polygon is not 
self- intersecting. Let P be a simple polygon of k sides with consecutive 


(clockwise or counterclockwise) k vertices {T 41) (2,42) 


(x ky Yr); Then the area of the polygon P is given by the formula 
1 
Area of P = 5 («et (| ms ae 


+ det T2 73 ) 
yı Ye y2 Y3 
+e det ( aol) Sk ) + det Tk Sı )) 
Yk-1 Yk Yk Yı 
Let us apply the general linear map T above to this polygon to get a new 


polygon Q with k consecutive vertices {T {Tı ’ y) iT (@2; y2) P A 


KT kı Yk .Are their areas related as they are for parallelograms? We will 
check this for an arbitrary polygon with k = 7 sides: 


(10.6) 


(P = Table [{xi, yi}, {i, 1, 7}]) // MatrixForm 


X1 yı 
X2 y2 
X3 Y3 
X4 Y4 
X5 Y5 
X6 Ye 
X7 Y7 


T[i- y-}]:= ALES {55 


490 


(TP = Table [Flatten [T [P [ [k]]]], {k, 1, 7}]) // MatrixForm 


axı +y, 6x1 +Oy, 
axg+By, X2 +0y2 
axs +yz X3 +0y3 
axa + 8y ôx4+0y4 
axs+Bys, ôxs+0ys5 
axg6t+Byg xe +0ye 
ax7+ By x7 +0y7 


AreaTP = 1/2 (TP[[7,1]] TPI[1, 2]] + Sum[TP[[k1]] TP[[k+,2]], {k, 1, 
63] — TP[[7,2]] TP[[1,1]] — Sum[TP[[k,2]] TP[[k+1,]], {k, 1, 6}]) 


1 
zÁ — (6x1 + Oy,) (axe + By) + (axı + By,) (8x2 + Oye) 


— (6x2 + 0 y2) (ax3 + 8 y3) + (a x2 + B y2) (x3 +0 y3) 
— (6x3 + 0 y3) (a x4 + Byy) + (axa + By) (Ox4 + Oy,) 
— (6x4 + Oy4) (axs + Bys) + (axa + By4) (fxs + Oys) 
— (6x5 +0 y5) (axs + Bys) + (axs + Bys) (6x6 + Oy¥¢) 
+ (5x1 + Oy,) (ax7 + Byz) — (6x6 + Oy6) (ax7 + Byz) 
~ (axı + By,) (5x7 + Oyz) + (axs + Bye) (5x7 + Oy7)) 


AreaP = 1/2 (P[[7,1]] P[[1,2]] + Sum[P[[k,1]] P[[k+1,2]], {k, 1, 6}] - 
P[[7,2]] P[[1,1]] — Sum[P[[k,2]] P[[k+1,1], tk, 1, 631) 


l 
5 ( = X2 y1 + X7 y1 + X1 Y2 — X3 Y2 + X2Y3 — X4 Y3 + X3 Y4 
= X5 Y4 + X4 Y5 — X6 Y5 + Xs Ye — X7 Ye — X1 Y7 + X6 Y7) 


Simplify[AreaTP — Det[A] AreaP] 
0 


491 


Now we see that the linear map T transforms a polygon P, of k vertices, 
into a new polygon 7P, of k vertices, with 


(10.7) Area of TP = |det(A)| - (area of P) 


at least for the k = 2 and k = 7 cases. If this still does not convince you that 
the transformed area does follow the given formula, consider the 
following argument. First, we refer back to the equation for the area of a 
polygon, given in equation (10.6). Because of the way in which it is 
written, it would be enough to show the following 


act ( Ons 7 By; or + Byj+1 ) = det(A) - det pt ja ) 
625+ Oy; Sx j41 + Oyj ss Yj Yj+i 


for 1 <j <k, with j = k giving j + 1 = 1. Notice that the matrix on the LHS 
of the equation is simply the product of A and the matrix whose columns 


are the points (25%) and (Ti+ Yj+1)-, From Section 5.1, we know that 
given any two square matrices A and B, det(AB) = det(4) det(B). Hence, 
this last expression automatically holds. This allows a factor of det(A) to 
be pulled out of each term in equation (10.6) to give us equation (10.7) 
instead. 


Example 10.1.4. As another example, let us see the effects of the linear map T from 
Example 10.1.1 on the circle with center (3, 8) and radius 6. The standard equation for 
this circle is given by (x — 3} +(y- sy = 36; however, it can also be written 
parametrically as 


C: (x(t), y(t)) = (3 + 6cos(t),8 + 6sin(t)), t € [0,27] 


A= {{-2, 7}, {5, 333; 

Tix, y-3]:= ALO} B5 

CP = {3 + 6 Cos|t], 8 + 6 Sin[t]}; 
TCP = Flatten [T [CP]] 


{—2 (34-6 Cos|t])+7 (8+6 Sin|t]), 5 (3+6 Cos[t])+3 (8+6 Sin[t})} 
PlotCPTCP = ParametricPlot[{CP, TCP}, {t, 0, 277}, PlotStyle— 
{{Thickness[0.007], Blue}, {Thickness[0.007], Red}}]; 


PtsPlot = Graphics[{PointSize[0.015], Black, Point[{{3, 8}, Flatten[T[{3, 8}]]}]}]; 
Show[PlotCPTCP, PtsPlot, PlotRange—All] 


492 


Figure 10.3: Circle is transformed to an ellipse by T. 
y 


40 o 


20 40 60 80 


Figure 10.3 illustrates the fact that T transforms a circle into an ellipse with T taking the 
circle’s center to the ellipse’s center. This also implies that T takes the interior of the 
circle to the interior of the ellipse. To visualize which points on the circle correspond to 
which points on the ellipse, we will have Mathematica construct an animation. We will 
plot two vectors, one for the circle and one for the ellipse. The bases of the vectors will 
be at the corresponding centers, while the tips of the vectors will be on the circle and 
ellipse. Since both objects have been previously defined parametrically, we can pick N + 
1 evenly spaced points in the interval [0,277], where the first point is ż = 0 and last is t£ = 
2r to help with the animation. A frame of this animation can be seen in Figure 10.4. 


Manipulate [ 


ArrowPlots = {Graphics[{Arrowheads[.03], Thickness[.005], Blue, 
Arrow[{{3, 8}, CP /. t>tval}]}], Graphics [{Arrowheads [.06], 


Thickness[.010],Red,Arrow|{Flatten[T[{3,8}]],TCP/. ttval}]}]}; Show[PlotCPTCP, 
ArrowPlots, PtsPlot, PlotRange— All], 


{{tval, 0, “t”}, 0, 277, 2/323] 


Figure 10.4: The point on the circle at the tip of the short arrow corresponds to the point 
on the ellipse at the tip of the long arrow. The long arrow rotates clockwise while the 
short rotates counterclockwise. 


493 


Now, let us compute the arclengths and areas for both curves to see if they are related. To 
do this, we need to introduce a definition and two formulas. 


We now generalize the circle and ellipse to a general type of curve called a simple closed 
curve. A simple closed curve should be continuously differentiable, except possibly for a 
finite number of corner points, and enclose a simply connected region, that is, a region in 
one connected piece with no holes in it. Besides the circle and ellipse, a square, rectangle 
or parallelogram are also simple closed curves, as well as all simple closed polygons. 


Definition 10.1.1. A parametric curve C: (z, y) = (x(t), y(t)) t € [a, b] is said 
to be a simple closed curve if the following points apply: 


Closed—its starting point at t = a is also its stopping point at t = b. 
Simple—it intersects itself at only its endpoint, which are the same point. 


piecewise C l curve—it is a continuously differentiable curve except for at most a finite 
number of corner points. 


IfC: (z, y) = (x(t), y(t)} for t € [a, b] is a parametric curve, then the arclength 
L is given by the integral: 


494 


dx\? dy 
a) *\a 
(10.8) a 


For the circle, the arclength is 127 and the area is 367. 


dt 


From Green’s theorem of multivariable calculus, the area of the region inside a simple 


closed curve C: (z, y) ) (z(t), y(t) for T e [a, b] is 


(10.9) 
b 
Area of C = x(t) dt 
a dt 
b 
dx 


i ag a. 


=5/ Ok oZ 


if C is traversed in the counterclockwise direction. The area is the negative of these 
integrals if C is traversed in the clockwise direction. The ellipse is traversed clockwise, 
and so we take the negative of these integrals for its area: 


NIntegrate[Norm[D[TCP, t]], {t, 0, 27}] 

246.86 

ArcLengthEllipse = NIntegrate[Sqrt[D[TCP,t].D[TCP,t]],{t, 0, 27}] 
246.86 

N [ Ar cLengt hEllipse/( 1277 ) ] 

N [Sqrt [ Abs [Det [A]]]] 

6.54817 

6.40312 

AreaEllipse = N[-Integrate [TCP [[1]] D[TCP[[2]], t], {t, 0, 27}]] 
4636.99 

N [AreaEllipse / (367 ) | 

Abs [Det [A]] 

41. 

41 


495 


Again, we see that the linear map T multiplies arclength by approximately 
Videt(Ayl while it multiplies area by exactly |det(A)|. Should we expect 


the same relationships involving area and arclength to hold if the simple 
closed curve C is more complicated than a circle? 


Example 10.1.5. Let us investigate the effects of the linear map T from Example 10.1.1 
on the simple closed curve given by 


C: (x(t), y(t)) = ((3 —cos(12t)) cos(t) +7, (3 = cos(12t)) sin(t) +11), t € (0, 27] 


The curve and its transform can be seen in Figures 10.5 and 10.6. 


CP2 = {(3 — Cos[12 t]) Cos[t] + 7, (3 — Cos[12 t]) Sin[t] + 11}; TCP2 = Simplify 
[Flatten [T [CP2] | | 
{63 + 2Cos{t] (—3 + Cos[12t]) — 7(—3 + Cos[12t]) Sin{t], 
68 — 5 Cos|t] (—3 + Cos[12t]) — 3(—3 + Cos[12t}) Sin{t] } 


PlotCPTCP2 = ParametricPlot[{CP2, TCP2}, {t, 0, 27}, PlotStyle —{{ Thickness 
[0.004], Blue}, {Thickness[0.007], Red}}]; 


PtsPlot2 = Graphics [{Point Size [0.01], Black, Point[{{7, 11}, Flat- ten[T[{7, 
IGBTs 


Show[PlotCPTCP2, PtsPlot2, PlotRange — All] 


Figure 10.5: Simple closed curve C with center at (7, 11) and 
transformed closed curve with center at (63, 68). 


496 


“< 


{\ fi 
80 WI VLA 
ce — 


A 


40 
| 
20 
aie 
x 
20 40 60 80 
Manipulate [ 


ArrowPlots = {Graphics[{Arrowheads[.02], Thickness[.005], Blue, Arrow[{{7, 11}, 
CP2 /. ttval}]}], Graphics [{Arrowheads [.06], Thickness [.010], Red, Arrow 
[{Flatten [T[{ 7, 11}]], TCP2 /. t— tval}]}]}; 


Show[PlotCPTCP2, ArrowPlots, PtsPlot2, PlotRange—All], {{tval, 0, “t”}, 0, 27, 7/ 
32}] 


Figure 10.6: The point on the simple closed curve C at the tip of 
the short arrow corresponds to the point on the transformed simple 
closed curve at the tip of the long arrow. The long arrow rotates 
clockwise around the transformed curve, while the short arrow 


rotates counterclockwise around the original. 


497 


ArcLengthCP2 = NIntegrate[Sqrt[D[CP2, t].D[CP2, t]], {t, 0, 27}] 
53.0349 

ArcLengthTCP2 = NIntegrate[Sqrt[D[TCP2,t].D[TCP2,t]], {t,0,27}] 
347.281 

{ArcLengthTCP2/ArcLengthCP2, N[Sqrt[Abs[Det[A]]]]} 


{6.54817,6.40312} 

AreaCP2 = NIntegrate[CP2[[l]] D[CP2[[2]], t], {t, 0, 27}] 
29.8451 

AreaTCP2 = -Nintegrate[TCP2[[1]] D[TCP2[[2]], t], {t, 0, 27}] 
1223.65 

{AreaTCP2/AreaCP2, Abs[Det[A]]} 

{41.41} 


498 


Once again, we have that 7 multiplies arclength by Videt(A)] F 
approximately, while it multiplies area by exactly |det(A)|. To prove this 
relation holds in general, we employ a method similar to the argument 


used for simple closed polygons. Letting T: videaj 2 _, Videt(A)2 pe 
represented by the 2 x 2 matrix 


a B 
4-15 6 


and C: (x,y) — (x(t), y(t)) be an arbitrary simple closed curve for T € 
[a, b] with counterclockwise orientation, we wish to prove the following: 


Area of T(C) = |det(A)| - (area of C) 


(10.10) 


The first thing to note is that we have 
T(C) : (x,y) = (a x(t) + By(t),62(t)+Oy(t)), te [a,b] 


If the curve T (C) is also counterclockwise, then 


499 


j d 
Area of T(C) = f (ax(t) + By(t)) 4 -— ; (S(t) + Oy(t)) dt 


= f (oat wal ata) 
=as f TEETETTIIN y(t) 
+a0 fat ome () Sat 


+a0 f'a ‘flav f a(t) 


b 
mà (0-88) [ x(t) at 
= det(A) - (area of C) 


=aő o| 


In this string of equalities, we made use of the two facts 
1 l 
5 (27(b) -a2°a))=0, 5 (v*(b) - y*(a)) =0 


because C is a closed curve. Furthermore, in this string of equalities, we 
also used the fact that 


b b 
dx dy 
f y(t) dt = -f s(t) t 


which is Green’s theorem as stated previously, in equation (10.9). 


If the curve T (C) is clockwise, then a similar calculation gives 


500 


Area of T(C) = —det(A) : (Area of C) 


Thus, the linear map 7 maintains the same orientation between the two 
simple closed curves C and T (C) if det(A) > 0, and switches orientation if 
det(A) < 0. If det(A) = 0, then the curve T (C) is not a simple closed curve. 


Homework Problems 


1. Find a linear map that maps the unit square to the parallelogram 


with vertices at (0, 0) 5 


in matrix form. 


(2,1), (1:3), and (3-4), Express your answer 


2. Using the map from problem 1, verify that the area of the 
transformed parallelogram is equal to the determinant of the matrix 
A corresponding to the linear map. 

3. Find a linear map that maps the unit square to the parallelogram 


; d i uaar pose {—8 
with vertices at (0, 0). (-3,3), (-5,-2), and \ 5, 1). You can express 
your answer in matrix form. 


4. Using the map from problem 2, verify that the area of the 
transformed parallelogram is equal to the determinant of the matrix 
A corresponding to the linear map. 


5. Find a linear map that maps the parallelogram with vertices at 
(0,0) (2, 1) , (1, 3), and (3.4) , to the parallelogram with vertices at 
(0,0) (~3,3) (-5,-2), and (78 i). 

6. Explain why there does not exist a linear map that maps the unit 
square to the parallelogram with vertices at (1, 1) (3, I) (2, 4) , and 


(4,4) 


7. Find the matrix corresponding to the linear map that rescales the 
unit square by a factor a > 0 in both the x- and y-directions. 


8. Find the matrix Ax corresponding to the linear map that flips the 
unit square about the x-axis. 


9. Find the matrix A) corresponding to the linear map that flips the 
unit square about the y-axis. 


10. Is it true that the matrix Ax Ay corresponds to the linear map that 
flips the unit square about both the x- and y-axes? 


11.(a) Find the arclength of the spiral parametric curve 


501 


C : (x(t), y(t)) = (tcos(t), tsin(t)), t € [0, 27] 


You will need to make use of the following equation: 


2x 
n - l - 

V1+tdt = 7V1+4nr? — 5 In (20 + 1 +47?) 
0 
(b) Find the arclength of the new parametric curve T (C) 
obtained by applying to C the linear map T: R? — R? with 
tule 1 Y) = (3x—5y, 5a+3y). Are the two arclengths 
related as expected? 
(c) Find the arclength of the new parametric curve S(C) 
obtained by applying to C the linear map S: R? — R? with 
rule s(* Y) = (ar—by, br+ay), a,b € R. Is this arclength 
related to that of C as expected, and why might you expect it to 
be true from the form of the rule for this linear map S? 


12. Find the arclength and the area inside of the astroid parametric 
curve from Mathematica problem 3. 


Mathematica Problems 


1. Consider the figure-eight curve given parametrically by 
C : (x(t), y(t)) = (2cos(t),2sin(2t)), t€ [0,27] 


and the linear map matrix 


a41] 


Graph the parametric function and its corresponding transform. 
Compute the area enclosed by the original curve and the 
transformed curve, then determine whether equation (10.10) holds. 
2. Repeat problem 1 except with 


502 


l -1 1 
Amal a i] 
3. An astroid is defined parametrically by 


(a(t), y(t)) = (cos*(t), sin? (t)), te [0,27] 


Transform the astroid using the matrix 
-1 1 


then graph both the parametric function and its corresponding 
transform. Compute the area enclosed by the original curve and the 
transformed curve and determine whether equation (10.10) holds. 

4. Repeat problem 3 with 


1 1 1 
anal i] 
5. Construct a map matrix A that rotates the astroid from problem 3 
= 

through an angle of 4, with the resulting astroid having five times 
the area of the original astroid. 
6. Redo homework problem 11, plotting the original C and the new 
T(Q). 
7. Using the spiral parametric curve C of homework problem 11, 
find the approximate arclength of the new parametric curve T (C) 
obtained by applying to C the linear map T: R? — R? with rule 7 ( 
(x, Y) = (3x + 5y, 7z — 18y) . Are the two arclengths related as 
expected? Plot both C and T (C). 
8. Repeat problem 7 for the astroid parametric curve of problem 3. 
9. Repeat problem 7 for the bat parametric curve 


C : (x(t), y(t)) = (r(t) cos(t), r(t) sin(3t)}, t € [0, 2007] 


503 


where 


t 1 t sft 1 4 t 
r(t) =e") — 2cos(4t) + cos (4) - 5 sin (5) + sin” (=) - 5 cos (5) 


Plot this bat. Also, see what curve C you get if the coefficient 3 is 
changed to 2 or 1 in sin(3®). 

10. Let C be the simple closed curve given as the regular pentagon 
P inscribed in the unit circle. 


(a) Find P’s perimeter and enclosed area. 


(b) Now transform this pentagon P by the linear map T given 
by multiplication by the matrix 


3 -l 
jit 
into the new pentagon T (P). 


(c) Find the perimeter and enclosed area of T (P). How are 
these quantities related to those of P? 


(d) Plot both the pentagons P and T (P) in different colors. 


10.2 The Decomposition of 
Linear Maps into Rotations, 
Reflections, and Rescalings 
in r? 

ep a oa 


matrix A corresponding to a rotation about the origin from the positive 
x-axis by an angle of 0 is given by 


504 


— cos(@) —sin(@) 
— sin(@) —_ cos(@) 
(10.11) 


Linear maps can also be used to rescale objects; hence we will call a map 
a rescaling if it does exactly that. For instance, if we wish to rescale 
objects horizontally by a factor a > 0 and vertically by a factor b > 0, then 


T =Y) = (ax, by) . The matrix A corresponding to this map is much 
easier to determine, and is given by 


a 0 
Aww =| o A 


Finally, linear maps can reflect objects across the x-axis by 7 K7 y )) = 


(x, =y) or about the y-axis by rÉ" ’ Y) E a E matrices given, 
respectively, by 


a 0 
Alab) = | 0 4 


The amazing fact is that all linear maps are simply composites of these 
three basic types of linear maps: rotations To about the origin through the 
angle 0, general rescalings T(a,b) by factors of a horizontally and b 
vertically, and reflections such as Ty about the x-axis. 


(10.12) 


(10.13) 


Our first question is: Do these three basic types of linear maps commute? 
Let us test and see: 


R = {{Cos[0], — Sin[0]}, {Sin[0], Cos[0]}}) // MatrixForm 


(Smo) cos) ) 


(S = {{a, 0}, {0, b}}) // MatrixForm 


505 


(ReflectX = {{1, 0}, {0, -1}}) // MatrixForm 


i og 
0 -1 
R.S // MatrixForm 


( aCos[ð] —b Sinf] ) 
aSin[0] b Cosļ[8] 


S.R // MatrixForm 


( aCos|ð]) —aSinfð] ) 
bSin[ð]) b Cos[ð) 


From the above two matrix multiplications, we see that rotations and 
rescalings do not commute. 


R.ReflectX // MatrixForm 


(sii) “cov ) 


ReflectX.R // MatrixForm 


( Salo) cosa ) 


Also, rotations and reflection about the x-axis do not commute. 


S.ReflectX // MatrixForm 


506 


(3 +.) 


ReflectX.S // MatrixForm 


a U 
0 -b 


Now we see that reflections about x-axis and rescalings commute. Next, 
we want to know whether two of the same type commute: 


(TrigFactor[R.(R /. {0—¢})]) // MatrixForm 


Cos{@ +4] —Sin[@ + ¢] 
Sin[? +] Cos + 4] 


(TrigFactor[(R /. 0—¢}).R]) // MatrixForm 


Cos|# +) —Sin{@ + ġ] 
Sin|@ +] Cos{@ + ¢] 


(S.(S /. {a—c, b—d})) // MatrixForm 


ac 0 
0 bd 


((S /. {a—c, b—d}).S) // MatrixForm 


ac U 

0 bd 
We have verified with Mathematica that the composite of two rotations is 
another rotation and they commute, which you were asked to prove in 
homework problem 3 of Section 4.4. The same holds true for two 


rescalings. Now we want to see how we could write an arbitrary linear 
map 7 as a composite of these three basic types of linear maps. 


507 


First, we shall look at this on the level of matrix multiplication instead of 
the composite of linear maps since they are equivalent operations to each 
other. Then, our rescaling matrices A(a,b) for positive scalars a and b must 
be generalized to have entries a and b that can be any real numbers. This 
is not really much of a generalization except to allow the entries a and b to 
be either, or both, zero since 


-fo 5] 
Erig: 
|| 


pru 

| : i 
Oo 8 — i ~) 
| l 
~ © © oO 
a) | 
Il | 
( es | 
on 


represents the reflection about the y-axis and 


[o -1 


represents the reflection about the origin. 


Using all of the information that we have gathered thus far, we now 
attempt to decompose an arbitrary linear map given in matrix form by 


The first step is to recognize that the matrix A can be expressed as 


508 


a b |_| Reos(@) Qcos(¢d) 
aoip + © d| | Rsin(@) Qsin(9) 


if each column of A is treated as a point in the xy-plane at distances R and 
Q from the origin with angles 0 and ¢ measured from the positive x-axis, 
respectively, where 


(10.15) 


= Va +e, Q= Vb? +d*, 0 = tan” ' (£), ¢= tan" (F) 


b 


provided a, b # 0. So it appears that we have rewritten our original matrix 
A in a more complex form; however, the point was to decompose the 
matrix, of which we now have the first step: 


(10.16) 


Reos(@) Qcos(d) cos(@) cos(ġ) R 0 
Rsin(@) Qsin(d) sin(@) sin(ġ) 0 Q 


The second matrix on the RHS of this equation is a rescaling, but the first 
matrix is not a rotation. So we need to rewrite the first matrix as a product 

l l 
of rotations. To do this, we use the substitution B = 2 (0+ ¢)and C= 2 (0 
— ¢), which in terms of 0 and ¢ is 0 = B + C and ¢ = B — C. So we can 
write 


(10.17) 


cos(#) cos(d) | _ | cos(B +C) cos(B - C) 
sin(@) sin(d) | | sin(B+C) sin(B -— C) 


and with the usual trigonometric identities: 


509 


(10.18) 
_ | cos(B)cos(C) —sin(B)sin(C) cos(B) cos(C) + sin(B) sin(C) 
~ | sin(B)cos(C) + cos(B)sin(C)  sin(B)cos(C) — cos(B) sin(C) 
_ | cos(B) —sin(B) cos(C)  cos(C) 
~ | sin(B) — cos(B) sin(C) —sin(C) 


The first matrix on the last line is just a simple rotation; however, the 
second is not in the standard form of a rotation. So we now turn our 
attention to the second matrix above. Note that 


(10.19) 


cos(C)  cos(C) | _ | cos(C) 0 1 1 
| sin(C) —sin(C) | E l 0 sin(C) | | 1 -1 | 


where the first matrix on the RHS of equation (10.19) is simply a rescaling 
by the value of cos(C) in the x-direction and sin(C) in the y-direction. The 
right matrix, however, is not in the form of one of the three types, but we 
can decompose it as follows: 


(10.20) 


where 


vi A| _ |3) -sin(3) 
io sin (4) cos (5) 


We can now represent our original matrix A in terms of rotations and 
rescalings: 


510 


(10.21) 


a b |_| cos(B) —sin(B) Pome) 0 E 
c d =| sate cos(B) V2sin(C) 


| cos (4) —sin (2) | R 0 | 
sin(}) cos (#) 0 -Q 


where 
(10.22) 
6 0- 
R=Va+e Q=Vb?+d2 B= c=-* 


a = Rcos(@) c= Rsin(0) b=Qcos(ġ) d=Qsin(¢) 


So the matrix A can be written as a product of two rotations and two 
general rescalings, where general rescalings include the reflections about 
the x-axis, y-axis, and the origin, as well as normal positive rescalings. In 
particular, when we perform the following matrix multiplication 


from a geometric perspective, we are first rescaling the point (x, y) by a 
factor of R in the x-direction, and —Q in the y-direction, then rotating by an 


mw 
angle of 4, then rescaling again by a factor of v2 cos(C) in the 


x-direction and v2 sin(C) in the y-direction, after which we finally rotate 
through an angle of B to end up at the point (ax + by, cx + dy). To find the 
values of R, Q, B, and C, we use the formulas from (10.22). 


Example 10.2.1. Now, let us do an example of this decomposition with 


-8 5 
TEE 


511 


Col Al = {{-8}, {3}}; ColA2 = {{5}, {-1}}; 
R=Norm[ColA1] 


v73 

Q=Norm[ColA2] 
v26 

0=N[z+ ArcTan [-3/8] ] 
2.78282 
{R Cos [0], R Sin[0]} 
{-8., 3.} 
¢ =N[ArcTan[-1/5]] 
—0.197396 
{Q Cos[9], Q Sin[¢]} 
{5., -l.} 
B=1/2 (0+¢) 
1.29271 
Cv=1/2 (0-4) 
1.49011 


{{Cos[B],-Sin[B]}, {Sin[B],Cos[B]}}.{{Sqrt[2] Cos[Cv],0}, {0,Sqrt[2] Sin[Cv]}}. 
{{Cos[z/4], -Sin[7/4]}, {Sin[7/4], Cos[z/4]}}.{{R, 0}, {0, — Q} // MatrixForm 


-8. 5. 
5 =l. 
Now we see directly that our decomposition of a linear map T into a product of rotations 
and general rescalings works. In particular, we can write Tas T= T1 o T2 0 T3 o T4, or in 


matrix form as A = A1A2A43A4 with 


A, = | 008(1-292713212) — sin(1.292713212) 
: sin(1.292713212)  cos(1.292713212) 


V2 cos(1.490108772) 0 
A2 od . 
0 V2 sin(1.490108772) 


(Z\ ein (Zz 
Ay =| Ost} thy | 
sin (4 cos (4 


This decomposition cannot be reordered, as we check below. 


512 


{{Cos[B],—Sin[B]}, {Sin[B],Cos[B]}}.{{Sqrt[2] Cos[Cv],0}, {0,Sqrt[2] Sin[Cv]}}.{{R, 
03, £0, -Q}}. {{Cos[7/4], -Sinz/4]}, {Sin[7/4], Cos [7/4]}} // MatrixForm 


5.07622 4.69814 
—0.733001 —2.05738 


Homework Problems 


1. Express each of the following linear map matrices as the product 
given in (10.21): 


v2 v2 
al 23] w E | : g 
zA 0 -5V2 -1 =i 7 


2. In example 10.2.1, it was shown that the decomposition could not 
be reordered in the fashion 41424344. Determine whether any of 
the following reorderings agree with the decomposition 41424344: 


(a) 42414344 (b) 44434241 (c) 43424144 (d) 42434144 


3. Consider the map that takes a point (x,y) to the point (0,2 — y) 
. Express the map as the product 41424344. 


4. IfA c R?? is not invertible, can it still be decomposed into the 
product of the four matrices? 


5. Explain geometrically what the map given by 


does to a point (ry ), and then decompose A. 
6. Let T be the linear map that is first the rotation about the origin 


through the angle a = 6™ followed by the reflection about the 


513 


y-axis, and then the rescaling by a factor of 7 in the x-direction. 
Find the matrix A that represents this linear map 7, and then 
decompose it according to formula (10.21). 
7. (a) Explain if the inverse of a rotation, rescaling, or reflection is 
also of the same type, and if it is, give a formula for this inverse that 
clearly preserves the type. 
(b) Explain what happens if the inverse of a rotation, rescaling 
or reflection does not exist or is not of the same type. 
8. Decompose the 2 x 2 off-diagonal matrix 


0 a 
a=[2 8] 
9. If you know the decomposition according to formula (10.21) for 
a matrix A, then is there any simple way to get the decomposition 
for A”? Explain in detail. 
10. If we switch to using complex numbers from C instead of real 
numbers in IR, then does a linear map T: C? — CÊ? and its 
corresponding 2 x 2 complex matrix A € les representing it have 
a similar decomposition to the real case discussed in this section? In 


particular, are there rotations, reflections, and rescalings in the 
complex case similar to the real case? 


11. Let T: R? — R? be a linear map with the matrix 


LU Q 
A=|0a b 
0 c d 


representing it for a, b, c, d € R. 
(a) Decompose A using the methodology of this section, and 
explain your results geometrically in R?. 
(b) Apply the results of part (a) to the matrix 


514 


1 0 0 
A=|]0 -1 3 
0 S 5 


Mathematica Problems 


1. Express each of the following linear map matrices as the product 
given in (10.21). 


vra wigs] wlan 


3 73 9 2 -4 
ofi d of? e o| i 
5 4 3 


l 
an aa 
a) 


5— 2i -l +i 
34+8: 2-5i 


(a) Write A as A = B + Ci, where B and C are the real 2 x 2 
matrices that are the real and imaginary parts of A. Now 
decompose B and C according to formula (10.21). 

(b) Does this tell us anything about the geometric nature of the 
linear map T: CE , which has the matrix A representing it? 
Explain. 


515 


10.3 The Effect of Linear 
Maps on Volume, Area, and 
Arclength in RS 


Now we switch from two dimensions to three, and linear maps T: R? — 


R? with corresponding 3 x 3 matrix representations. Naturally, we want 
to see how things stay the same, or change, as we move to a higher 
dimension. As with the case of maps in R?, T must take a line segment to 
another line segment, since this is independent of dimension. But does this 
determine T uniquely as in two dimensions? At this point you should take 
a guess before moving on to the next paragraph. 


It seems that it should not. This call is made because the same formula in 
two dimensions will no longer work, since we do not have a square matrix 


that we can invert. Thus, we need three points, and so a triangle in R3 is 


needed to replace the line segment of IR. As with the R? case before, we 
require that no line segment making up the triangle lie on a line through 
the origin. So if T takes the triangle Ti = { 


{{(£1, y1, 21), (12, Y2, 22), (T3, Y3, 23)} } in its domain to another 
triangle T2 = (u a Vas w), (uz, ve, w2}, (uz, U3, ws)} » in its range, 


then our two-dimensional formula suggests that the standard matrix A 
representing the linear map 7 in three dimensions is 


uy u? u3 Tı T2 T3 
A= Vv, U2 v3 yı Yo Y3 
wy w2 w3 Z1 72 %3 


(10.23) 


Example 10.3.1. We now verify this formula with an example in which we find the 
standard matrix A representing the linear map 7, where 


T((—4,7,3)) = (5,2, -9},T({11, —6, 2)) = (—1,3,8),7'((9, 13, -5)) = (3, —6, 7) 


The triangle and its transform are depicted in Figure 10.7. 


516 


P= (4, 7, 3}; Q = {11, —6, 2}; R = {9, 13, -5}; (PQR = Transpose[{P, Q, R}]) // 
MatrixForm 


-4 ll 9 
7 -6 13 
3 2 -5 


TP = {5, 2, —9}; TQ = {-1, 3, 8}; TR = {3, —6, 7}; (TPTQTR = Transpose[{TP, TQ, 
TR})) // MatrixForm 


5 l 3 
2 3 -6 
-9 8 7 


(A = TPTQTR.Inverse[PQR]) // MatrixForm 


Des 165 237 
isi 362 362 
19 _ 121 1057 
543 1086 1086 
130 ci = _ 204 
181 isi is] 

{A.P, A.Q, A.R} 


{ {5, 2, -9}, {-1, 3, 8}, {3, -6, 7}} 
TrianglePQR = Graphics3D [{Blue,Opacity [.5] ,Polygon[{P,Q,R}]}]; 
TriangleTPTQTR = Graphics3D[{Red, Opacity[.5], Polygon[{TP, TQ, TR}]}]; 


TxtPlots = Graphics3D[{Black, Text[“P”, {—4, 7, 4}], Text[“Q”, {11, —6, 3}] , 
Text[“R”, {9, 13, —6}], Text[“TP”, {5, 2, -10}], Text[“TQ”, {-1, 3, 9}] , Text[“TR”, 
{3, —6, 8313]; 


Show [TrianglePQR, TriangleTPTQTR, TxtPlots] 


Figure 10.7: Triangle POR, pointing left, is transformed to triangle 
T(P)T(Q)T(R), pointing down, by T. 


517 


This example justifies our belief that a linear map 7 on Euclidean 
three-space will be uniquely determined by taking a triangle to another 
triangle in three-space. 


Example 10.3.2. Now, let us see how this linear map T transforms the length of these 
three line segments and the areas of these two triangles. How do you think T will behave? 
Also, we shall see how T affects the volume of a parallelepiped. 


LengthPQ = N[Norm[Q — P]] 
19.8746 

LengthTPTQ = N[Norm[TQ — TP]] 
18.0555 

LengthTPTQ/LengthPQ 

0.908469 

N[{Det[A], Sqrt[Abs[Det[A]]], (AbsfDetfA]])"/3}] 
{0.399632, 0.632164, 0.73658} 
LengthPR = N[Norm[R — P]] 
16.4012 

LengthTPTR = N[Norm[TR - TP]] 
18. 


518 


LengthTPTR/LengthPR 

1.09748 

LengthQR = N[Norm[R — Q]] 
20.347 

LengthTQTR = N[Norm[TR — TQ]] 
9.89949 

LengthTQTR/LengthQR 

0.486534 


(LengthTPTQ / LengthPQ + LengthTPTR / LengthPR + LengthTQTR / 
LengthQR)/3 


0.830827 


It seems that the average effect of T on these line segment lengths is 0.8308274317, 
which is approximately the cube root of |det(A)|. Now, let us check on the areas of these 
two triangles. 


AreaPQR = 0.5 Norm[Cross[Q — P, R — P]] 

150.524 

AreaTPTQTR = 0.5 Norm[Cross[TQ — TP, TR — TP]] 
85.8021 

AreaTPTQTR/ AreaPQR 

0.570022 

NI {Sqrt [Abs [Det[A]]], (Abs [Det [A]]) 27 }] 
{0.632164, 0.54255} 


wits 


The linear map T seems to change area by a factor of approximately “ power of |det 
(A)|. Now we want to check how T changes the volume of the parallelepiped with three 
adjacent sides with the position vectors for the points P, Q, and R. The volume of the 


parallelepiped, whose adjacent sides are the vectors P , and R is 


[P . (Q x R)| = |det(PQR)| 


where the points P, Q, and R make up the rows of the matrix POR. Figure 10.8 is a 
depiction of the parallelepiped and its transform. 


ParallelepipedPQR = Graphics3D[{Blue, Opacity[.5], Polygon[{{0,0, 0}, Q, P+Q, P}], 
Polygon|{Q, R+Q, P+Q+R, P+Q}], Polygon[{R+Q, P+Q+R, P+R, R}], Polygon|{P, 
P+Q, P+Q+R, P+R}], Polygon|[{{0, 0, 0}, R, P+R, P}], Polygon|{{0,0,0}, Q, R+Q, 
R3; 


Parallelepiped TPTQTR = Graphics3D[{Red, Opacity[.5], Polygon[{{0, 0, 0}, TQ, 
TP+TQ, TP}], Polygon[{TQ, TR+TQ, TP+TQ+TR, TP+TQ}], Polygon[{TR+TQ, 
TP+TQ+TR, TP+TR, TR}], Polygon[{TP, TP+TQ, TP+TQ+TR, TP+TR}], 
Polygon[{{0, 0, 0}, TR, TP+TR, TP}], Polygon[{{0, 0 ,0}, TQ, TR+TQ, TR}]}]; 


519 


Show[ParallelepipedPQR, ParallelepipedTPTQTR, TxtPlots] 


Figure 10.8: Parallelepiped defined by vectors P, Q, and R, is 
transformed to the parallelepiped defined by vectors TP A TO , and 


Th 


20 
as 

0 
10 
zo 
-10 

0 10 
y 20 


VolumePQR = Abs[Det[{P, Q, R}]] 

1086 

VolumeTPTQTR = Abs[Det[{TP, TQ, TR}]] 
434 

N[{ VolumeTPTQTR/VolumePQR, Det[A]}] 
{0.399632, 0.399632} 


The linear map T changes volume by the factor |det(A)|. Now, let us see if 
this is also true for the volume of a solid inside a simple closed surface S. 
A simple closed surface S is one that does not intersect itself and has no 
boundary curve; that is, it is a surface that contains a simply connected 
(one connected piece without any holes) solid inside. A sphere, ellipsoid, 
infinite (in both directions) cylinder, hyperboloid of one sheet, paraboloid, 
and infinite cone are all examples of simple closed surfaces. 


520 


From the divergence theorem of multivariable calculus, the volume V of 
the solid inside a simple closed surface S is given by 


d pb 
v=; / / FP (u,v) (FÈ x F2) du dv 


(10.24) 


where the surface S is given by its position vector field E Gi. v) fora<u 
>. 2 : > _ <a>, y 
<bandc<v<d. Also, ™ ~ Ju P (u,v) and Tv = go T (WY) are the 


u- and v-direction tangent vector fields to S i, v), respectively. We also 
have that 


=} 
P (u,v) (TÈ x T) = det Tè 


Ty 
(10.25) 


Example 10.3.3. Let us test this formula out on the following example. Let the linear 


map T: R: > R 3 be the one with standard 3 x 3 matrix A representing T given by 


-8 5 -3 
A= 72 -9 
1 6 1 


Let T send the sphere S with center at (8,2,4) and radius 5 to the surface 7(S). S has a 
position vector field defined by 


7 (u,v) = (5cos(u) sin(v) + 8, 5sin(u) sin(v) + 2, 5cos(v) + 4) 
for u € [0, 27] and v e [0,7]. This parametric surface definition of a sphere can be found 
in most multivariable calculus books. 

A= {{-8, 5, —3}, {7, 2, 9}, {1, b, 4}}; 

R = {5 Cos[u] Sin[v] + 8, 5 Sin[u] Sin[v] + 2, 5 Cos[v] + 4}; 
Ru = D[R, u] 

{-5Sin[u] Sin[v], 5Cos[u] Sin[v], 0} 

Rv = D[R, v] 

{5 Cos[u] Cos[v], 5 Cos[v] Sin[u], —5 Sin[v]} 


521 


Integ = Simplify[Factor[TrigExpand([Det[{R, Ru, Rv}]]]] 
—25 Sin[v] (5+4 Cos[v]+8 Cos[u] Sin[v]+2 Sin[u] Sin[v]) 
VolumeR = Abs[N[1/3 Integrate [Integ, {u, 0, 27}, {v, 0,7}]]] 
523.599 

N[4/3 153] 

523.599 

TR=AR; 

TRu = DJTR, u] 


{25 Cos[u] Sin[v] + 40 Sin[u] Sin[v], 10 Cos[u] Sin[v] — 35 Sin[u] Sin[v], 30 Cos[u] 
Sin[v] — 5 Sin[u] Sin[v]} 


TRv = D[TR, v] 


{ — 40 Cos[u] Cos[v] +25 Cos[v] Sin[u] + 15 Sin[v], 35 Cos[u] Cos[v] + 10 Cos[v] Sin[u] 
+ 45 Sin[v], 5 Cos[u] Cos[v] + 30 Cos[v] Sin[u] — 20 Sin[v]} 


IntegT = Simplify[TrigExpand[Det[{TR, TRu, TRv}]]] 

20025 Sin[v] (5+4 Cos[v]+8 Cos[u] Sin[v]+2 Sin[u] Sin[v]) 
VolumeTR = Abs[N[I/3 Integrate [IntegT, {u, 0, 27}, {v, 0, z}]]] 
419403. 

Abs[Det[A]] VolumeR 

419403. 

Abs[Det[A]] 


We have the volume relationship as expected, with the linear map T causing a change in 
volume by the factor of |det(4)|. Now, let us plot both of these solids and see that T (S) is 
an ellipsoid as we might have expected. This is depicted in Figure 10.9. 


PlotRTR = ParametricPlot3D[{R, TR}, {u, 0, 2 7}, {v, 0, 2}, PlotStyle— {Red, Blue}, 
Mesh—None] 


Figure 10.9: The sphere is transformed by 7 to the ellipsoid. 


522 


te 


From multivariable calculus, the surface area As of a simple closed surface S can be 
computed by the formula 


d pb 
as= f i lm x Fe] dudv 


where the surface S is given by its position vector field ? (u,v) for u € [a, b] and v € 
[c, d]. Let us see how this works on our two surfaces. We hope once again to find that the 
surface areas of the two are related by approximately the factor of the square root of 
Įdet(4)]. 


IntegR = Simplify[Sqrt[TrigExpand[Sum|[(Cross[Ru, RvI[[kI])?, {k, 1. 3}]]]] 


Vv Sir |2 


25 4 


SurfaceAreaR = Integrate[IntegR, {u, 0, 27}, {v, 0, z}] 

100 a 

4n5? 

1002 

IntegTR = Simplify[Sqrt[TrigExpand[Sum|[(Cross[TRu, TRyI[[k]])?, {k, 1, 3}]]]] 


523 


Z 25y (Sin[v}? (15844 + 2435 Sin[2 u] — Cos[2 v] (1824 + 2435 Sin[2 u]) 


— 4050 Cos[2 u] Sin{v]? + 4910 Cos[u] Sin{2 v] + 3452 Sin{u] Sin[2 v))) 


SurfaceAreaTR = NIntegrate[IntegTR, {u, 0, 2 7}, {v, 0, 7}] 

28211.6 

SurfaceAreaTR/SurfaceAreaR 

89.8004 

N [ { Sqrt [Abs [Det [A]]], Abs[Det[A]]>}] 

{28.3019, 86.2492} 

As can be seen by the last three outputs above, the ratio of the surface area of the 


y et 
transformed surface to that of the original is not close to the value of Idet(A)| , but is 


close to the 3 power of ¥ Idet(A)| 


for this example. 


Homework Problems 


1. Compute the matrix A corresponding to the linear map that takes 
triangle T1 to triangle T2 defined by the following corner points: 


(a) T, = {(1,1, 1), (1, -1, -2), (1,1, -1)} 
Ta = {(0, 1,0), (2, 1, —2), (—2, 0, 1)} 


(b) T, = {(1, 2,3), (-3, 2, —1), (1, 1, —2)} 
Tz = {(—2,0, 2), (0,3, 0), (—1, 2, —3)} 


(c) Ti = {(4, —6, 2), (—1, —1, —1), (1,1, —1)} 
Ta = {(3, 3,2), (-1, —3, 6), (3, 8, —2)} 


(d) Ti = {(—8, 3, 7), (—8, 3, 6), (—7, 3, 6)} 
Tz = {(—2,3, 4), (4,3, —9), (—1, 3, 3)} 


2. Verify equation (10.25). 
3. Given 


524 


and T2 = (1(3, 3, 2), (~1, ~3, 6), (3,8, -2)}n find the three 


points which define the corners of the triangle T1 such that 72 is the 
map of T1 by the matrix A. 

4. Prove that the map defined by the matrix A given below 
preserves the length of the sides of any triangle. Geometrically, 
what does the matrix A represent? 


1 1 
= Mer. 
1 1 
Azni~a Z0 
0 0 1 


5. Prove that the map defined by the matrix A given below scales 


toler 


line segments in R: by a factor of ‘ skes ict E€ R?, |A Ti = 


6. Prove that the map defined by the matrix A given below 
preserves the length of the sides of any triangle: 


2 1 
4 0 FW 
A=] 0 VE 0 
1 2 
4% Ven 


7. Show that the rows of the matrix from problem 6 form an 
orthonormal basis for R3. 


8. Prove or disprove the following statement: If the rows of A € R 
3*3 form an orthonormal basis, then the lengths of the sides of the 


525 


transformed triangle under the map A are the same as that of the 
original. 

9. Let C: (z, Y, z) = (x(t), y(t), 2(t)) for T e [a, b] be a 
parametric spacecurve. The arclength L, of C, is given by 


(a) Let C be the circular helix given by 


C : (x(t), y(t)) = (3sin(t) +7, 3cos(t) — 4, 2t — 9), t € [0,27] 


Find the arclength of C. 
(b) Let C be the general circular helix given by 


C : (x(t), y(t)) = (Rsin(t) + H, Reos(t) + K, at + b),t € [0, 27] 


Find the arclength of C. 


10. (Use problem 9.) Let T: R? — R? be the linear map 
represented by the matrix 


cos(#) —sin(@) 0 
A= | sin(@) cos(@) 0 
0 0 -1 


(a) Show that the curves T (C) and C from problem 9 part (a) 
have the same length. 


(b) Show that the curves T (C) and C from problem 9 part(b) 
have the same length. 


11. Verify the surface area formula 4n?Rr for the Torus 
(donut-shaped) simple closed parametric surface T given by 


T : (z(u,v), y(u, v), z(u, v)) = (R+r cos({u)) cos(v), R+r cos(u)) sin(v), r sin(u)) 


for u,v € [0,27]. Can you find a simple formula for its volume in 
terms of R and r? 


526 


Mathematica Problems 
1. Given the map defined by the matrix 


i 1 -2 
A=]| -2 0 l 
3 3 1 


for each of the following triangles, compute the lengths of all three 
sides and the area, and then perform the same computations with the 
transformed triangle: 


(a) T= Al, —1, 1}, (4, —1, —2), (1,1,0)} , 


œ r= ¢ {(1,0,1), (4,2, ~3), (1,6, -2)} , 
(c) r= { (7225), (-1,3,5), (3,0, -1)} } 


2. Graph both the original and transformed triangles from problem 
l. 


3. Perform the Gram-Schmidt orthonormalization process on the 
vectors formed by the rows of the following matrix: 


i: 
A=ziil1 2 -1 
2 3 2 


then use the new vectors to construct a new matrix B. 
4. Use the matrix B found in problem 3 to transform the following 


triangles: 
@ r= {{(1-1,1), (4, -1, ~2), (1,1, 0)} , 
(by T= ¢ 41,0, 1), (4, 2, 3), (1, 6, -2)} , 
o r= { E7225) (-1,3,5), (3,0, -1)} , 


Compute the lengths of all three sides and the area, and then 
perform the same computations with the transformed triangle. 


527 


5. Let T: R3>R? be the linear map represented by the matrix 


—] 3 5 
A= 4 1 -2 
9 -10 7 


For the parametric curve C in homework problem 9 part (a), find 
the length of T (C) and compare it to that of C, do they differ by a 
factor of approximately \det(A)| 39 Plot both the curves C and T 
(C). 

6. For the linear map T in Mathematica problem 5, find the surface 
area and volume of the solid for the torus simple closed parametric 
surface S and T (S) for S given by 


S : (x(u,v), y(u,v), z(u, v)) = (9+ 4 cos(u)) cos(v), 9 + 4 cos(u)) sin(v), 4 sin(u)) 


for u, v e [0,27]. Are their surface areas and volumes related 
through the value of det(A)? Plot both S and T (S). 


10.4 Rotations, Reflections, 
and Rescalings in Three 
Dimensions 


In this section, we look at those linear maps T: IR? — R? that are rotations 
about lines through the origin, reflections about a coordinate axis or the 
origin, and general rescalings. Let us begin by discussing rotations in R;, 
in particular we already know how to rotate objects by an angle 0 in the 
xy-plane. In three dimensions, a rotation in the xy-plane means a rotation 
about the z-axis. Therefore, we need to construct a 3 x 3 matrix that keeps 
all z values the same, and rotates the (x, y) coordinates by an angle of 0. 
Your first guess should be to put the 2 x 2 matrix in the upper-left corner 
of our hopeful matrix. The question remains: What do we do about the last 
row and column? Well, we wish for z to stay fixed; therefore we should 


528 


use the third row and column of the identity matrix. So our matrix looks as 
follows: 


cos(@) —sin(@) 0 
Ag, = | sin(@) cos(@) 0 


0 0 l 
(10.26) 


which we can verify works if we perform the multiplication 407 v. where 


Y eR? 


cos(#) —sin(@) 0 £ x cos(f) — ysin(@) 
sin(@) cos(@) 0 y | = | xsin(@) + ycos(@) 
0 0 l z z 


The matrix Ag, can be easily modified to get the two other axis rotation 
matrices: 


(10.27) 

l 0 0 cos(@) 0 —sin(0) 
Ae, = | 0 cos(@) —sin(0) |, Ao, = 0 l 0 

0 sin(@) cos(@) sin(@) O — cos(@) 


Now we want to find the matrix AUT that allows us to rotate about any line 
through the origin in three-space that is parallel to the unit vector U , 


where no component of U is zero. If any component of T is zero, then 
we would simply be rotating about one of the axes. If you analyzzşwhat 
we did in Section 7.1, you will find that the rotation matrix AU has 
columns obtained by revolving respectively each column of /3 from the 3 
x 3 identity matrix: 


v= {a, B, ô}; Q1 = {1, 0, 0}; 
Cntrl = (v.Q1) v 

{a’, ap, ad} 

R1 = Sqrt[1 - a°] 


529 


V1l-—a? 


U1 = (Q1 - Cntrl)/R1 


(Vine, - es, - 7s} 


W1 = Simplify[Cross[U1,v]] //.{f? + 6” > - a7} 


é B 
0, -nn =} 
{ vl- a? vl- ae 
Nvl = Cntrl + R1 Cos[6] U1 + R1 Sin[6] W1 


{a?+(1 — a?) Cos|@), a 8-a 8 Cos(6]—5Sin(0), 05-05 Cos[0]+8 Sina) } 


This is the first column of the rotation matrix aw. Now let us find its 
second and third columns: 

Q2 = {0, 1, 0}; 

Cntr2 = (v.Q2) v 

{o,B°, Bd} 

R2 = Sqrt[l— 2°] 

U2 = (Q2 — Cntr2)/R2 


yap VIF - a} 


W2 = Simplify[Cross[U2,v]] //.{a” + 0’1 - a7} 


ô Ge 
{Fee -AP 


530 


Nv2 = Cntr2 + R2 Cos[6] U2 + R2 Sin[0] W2 


{a B-a B Cos|6]+45 Sin{0], 8?+(1 — 8?) Cos|6], 88-88 Cos|6]—a Sin(o}} 


Q3 = {0, 0, 1}; 
Cntr3 = (v.Q3) v 
{a0, pô, 6°} 

R3 = Sqrt [1 — ô] 


V1-—- 6 


U3 = (Q3 — Cntr3)/R3 


(FS gre VI} 


W3 = Simplify[Cross[U3,v]] //:{a2 + B? —1 - ô% 
{ 8 Q o} 
V1- 82? y1- ô?’ 


Nv3 = Cntr3 + R3 Cos[0] U3 + R3 Sin[0] W3 
fa ô-—a ô Cos|6]—8 Sin[6], 8 8-8 6 Cos[6]+a Sin{}, 6+(1 — 8?) Coslð] \ 


You may have noticed some strange assumptions in the simplification 
commands above. These arose from the fact that we assumed the vector 


Y- V = (a, 8,8) 


to be of unit length, thus requiring a + B +8 = 


1. Since T simply represents the direction of the line about which w7/ire 
to rotate, this is an reasonable assumption. The full rotation matrix A is 
give below: 


(RotMatrix = Transpose [{Nv1, Nv2, Nv3}]) // MatrixForm 


531 


a? + (1 — a?) Cos] a8 — afCos|6] + Sinf] ad — adCos[é] — 8Sin{6] 
aß — a8Cos{@] — óSin[0] 8? + (1 — 8°) Cos[@] Bő- B5Cos{6) + aSinfð] 
ad — adCos{6} + 8Sin{6] 86 — 86Cos|6] - aSin[o] 8? + (1 — 5%) Cosi] 


(RM1 = N[RotMatrix /. {a—2/Sqrt[38], B—-—5/Sqrt[38], 6—3/Sqrt 
[38], 0—9 2/7}]) // MatrixForm 


—0.452596 -0.807724 0.89049 
—0.807724 0.444596 -—0.387192 
—0.377809 —0.387192 -—0.238979 


Chop[RMIL.Transpose[RMI]] // MatrixForm 


1. 0 0 
0 1. 0 
0 DL 


The matrix above is an example with T = a2 -5,3) and 
0 = 8x 

7 `. The calculations above may seem lengthy, tedious, and 
complicated; however, if we spend a few minutes working through the 
details, perhaps we can demystify this whole process. Remember that 73 is 
the 3x3 matrix whose columns are the standard basis vectors 


ad 

> o> = 

{e7, €2, €3} of R3. We will rotate © about T = v= (ar, B, 6) by 
an angle of 0, where k = 1,2,3 and a + B +8 = 1. Once again following 


the pattern of Section 7.1, we must find the point C (or in vector form © 
a =. 
)on U closest to the tip of vector Ek tn vector form, this is simply proj 


ek. 


a = 
(10.28) 


ta 
te 
ica 


7 


532 


> Rei 

Under the assumption that T- 1, we find that © = (U.ek Ww. Now 
that we have the center of rotation, we need to construct our two 
coordinate axes to rotate about. These coordinate axes will be the plane 


perpendicular to U centered at the point C. To find the first axis, notice 
F 
that Êk — ros is perpendicular to È. where R = wv - T is the 


distance from the tip of vector©F to the point C. So we define 


ret or 
(10.29) R 


to be the unit vector orthogonal to © that starts at the point C and points 


-=> 
in the direction of ©&. A unit vector orthogonal to both 7? and r is 
easy to compute by using the cross product: 


4 @x? 
fixe] 
(10.30) = x T 


The last two definitions of W are equivalent since © and T are in the 
same direction and both cross products result in a unit length vector. To 


see this, first we note that the first is obviously unit-length; then, since 


v 


and T are unit length and perpendicular, we are guaranteed that uU x 
will be of unit length by the formula 


IZ x Vl =lel Iv sin (7) 


From a computational perspective, it is easier to compute x v. 
however for matters of simplification, which we need soon, we choose the 


533 


first definition. Regardless of which formula you use for P, the set aw. 


P forms an orthonormal basis for the plane perpendicular to © . So, to 


=>} 
rotate Êk by an angle of 0 about v. which we will denote as Ck Ù), 
we simply perform the following vector operation: 


(10.31) ElOy) = T + Reos(0)W + Rsin(6)w 


As it stands, this formula may look simple; however, wv and wv both are 
more complicated expressions. Let us investigate this further. By looking 
at the definition of “ , we see that 

7 
ek 


Rcos(@)u@ = R cos(0) 
= cos(6)(&% - @) 


(10.32) 


The final term requires a little more work, but it does simplify after some 
algebraic maneuvers: 


oy kL 


Rsin(0)w = Rsin(0) C a | 
ex E| 


1 x-7?) 
= Rsin(0) —_______ 
Rl? x (& - @)| 
7 
= Rsin(@) ee! = = = s 


Now, we also have that 


(10.33) 


534 


|e x (e -— @)| = |e | ek — ¢| sin(a) 
= [È| Rsin (3) 
=|?|R 


(10.34) 


and 


Tx- T)= x- x? 
— 
(10.35) =7 xë 
Putting equations (10.34) and (10.35) into (10.33) gives 


l TxE. 
men Rsin(0) w TA sin(@) 
After all this simplification, we can rewrite (10.31) as 

ER (0-4) = T + (ef — 7) cos(0) + exe sin(@) 
(10.37) [el 


For k= 1, 2,3, we can break this expression into components: 


@ (0x) = (a? + (1 — a”) cos(8), a8 — af cos(0) + ôsin(8), aô — aô cos(8) — 2 sin(0)) 
(0-9) = (a8 — aß cos(0) — 5 sin(@), 8° + (1 — 8°) cos(8), 85 — 88 cos(8) + asin(@)) 
(0x) = (ad — ad cos(@) + 8 sin(), 85 — 85 cos(0) — asin(@),5* + (1 — 5”) cos(4)) 


So, given a vector Zz = (21,2223) we can express this as EÍ =7,1 + 


> = 
2&2 +23©3. If we wish to rotate EÍ about a vector v by angle 9, the 


unknown matrix aw applied to =z satisfies 


535 


Ay? = zA + 2Ayes + 23Ay eS 
—, 7? -> œ> 
= 21€1 (0-7) + z2€2 (03) + z3€3 (07) 


= [2 (07) 1202) | BOs] | 2 


Therefore, now we have that the matrix as columns are the vectors 


— 
€i oT E OT, and €3 OT), 


(10.38) 
Ay = 
a? + (1 — a?) cos(0) aß — aß cos(0) + dsin(@) ad — adcos(@) — 8 sin(@) 
aß — aß cos(@) — ésin(@) B? + (1 — 8°) cos(8) Bő — Bå cos(0) + asin(@) 
aô — adcos(@)+ Bsin(@) 85 — Bd cos(@) — asin(@) 5? + (1 — 8°) cos(@) 


Comparing the matrix 40 above to the matrix labeled RotMatrix in the 
previous set of Mathematica code reveals that we have arrived at the same 
rotation matrix as that found by Mathematica. 


Example 10.4.1. After all of these calculations, it is time to do an example. Let uv = 


(—5,3, 11) 

tact be a vector parallel to our axis of rotation that is a line through the 

origin. We want to rotate the point P(4,—7,6) about this line through angles 9, which are 
T 


multiples of 24 radians until we are back at the point P. The plot of these points is 
essentially the circle of rotation for P, as seen in Figures 10.10 and 10.11. 
P= {4, -7, 6}; w = (5, 3, 11}; 


Lengthw = Norm[w] 


V 155 


v =w/Lengthw 


536 


5 3 11 
31° V155" V155 


CenterPt = (v.P) v 
E <4 E 
25 15 55 
——, —, — 

31 31 31l 
PlotPCntr = Graphics3D[{Blue, Sphere[P, 0.5], Red, Sphere[Center- Pt, 0.5]}]; 
LinePlotv = Graphics3D[{Thickness[.002], Black, Line[{—3v,7v}]}]; 
RotMx = (N[ RotMatrix /. {a—v[[1]], £-—v[[2]], ô—v[[3]1}); 


TxtPlots = Graphics3D[{Black, Text[“C”, {—0.8, 0.5,3.5}] , Text[“P”, {4, -7, 7.5}]}]; 


Manipulate [Rot MxExp = Evaluate [RotMx /.{0—tval}]; NewPtPlotExp = 
Graphics3D [{Black, Sphere[RotMxExp.P, 0.5]}]; ArrowPlotExp = 
Graphics3D[{Arrowheads[.04], Thickness[.007], Black, Arrow[{CenterPt, 
RotMxExp.P}}}]; 

Show [LinePloty, PlotPCntr, NewPtPlotExp, ArrowPlotExp, TxtPlots, 
PlotRange—{f —10,10},{ —10,10},{ —10,10}}, Axes—Truel], {{tval, 0, “0”}, 0, 27, 2/24}] 


Figure 10.10: Rotation of the point P about the line L. This frame 
EAS 


in the Manipulate command corresponds to 06= 6 ` 


537 


PlotAllPtsP = Table [ Graphics3D [{Black, Sphere [Evaluate [(RotMx /. 
{0—tval}).P], 0.25]}], {tval, 0, 27, 7/24}]; 


PlotFirstArrow = Graphics3D [{Arrowheads[.04], Thickness[.004], Black, Arrow 
[{CenterPt, Evaluate [(RotMx /. {0—0}).P]}]}1; 


Show [P lot AllPtsP, LinePlotv, PlotPCntr, TxtPlots, Plot First Arrow, 
PlotRange—{{ -12, 12}, {— 12, 12}, {— 12, 12}}, Axes True] 
Figure 10.11: All 24 frames in the rotation of the sphere located at 


P(4,-7,6) about a line in the direction P= (-5,3,11) in R3. 


538 


-10 
10 0 


Now that we have determined all of the rotational matrices for R;, we 
will next turn our attention to rescalings. Similar to the two-dimensional 
case, a general three-dimensional rescaling is given by matrices of the 
form 


Alaba) = 


A 


O OA 
O ov o 
O o 


for real entries a, b, and c. The scaling matrix shown here corresponds to 
the linear map T LY; zh = (az, by, cz), When a, b, and c are all 


positive, we simply have a scaling in the same direction as (z, Y, 2), 
However, if at least one of the components along the diagonal is negative, 
we have not just a scaling, but a reflection as well. The matrix 


539 


is reflection about the xy-plane, since the x and y coordinates stay fixed, 
and the z coordinate is replaced by — z. The matrix 


-1 0 0 
Az = 0 -!l 0 
0 0 -1 


is reflection about the origin. The question of whether all linear maps T: 
IR? — R? can be decomposed into a composite of rotations and general 
rescalings is obviously a challenging one, and one should consider all of 
the work done in Section 10.2 to decompose all real 2 x 2 matrices into a 
product of rotations, reflections, and rescaling matrices. 


Homework Problems 


1. Prove that the following relations hold, and give a geometric 
interpretation of this result: 


(a) A = Á. 6. (b) Aj. = A-o, (c) Aj. = A-9, 


2. Verify that no two rotation matrices commute (e.g., Ag, Agy # Ady 
Ag, ). Remember to use different angle variables for each matrix. 

3. As a general rule, scalings and rotations do not commute. 
However, prove that scalings in just one direction commute with 
rotations corresponding to the axis of scaling for each of the x, y, 
and z axes. As an example, show that A(a,1,1) 40x = A@,A(a,1,1)- Can 
you give a geometric interpretation of this result? 


4. For rotations in the plane, it was shown that the rotation matrix 
A@ satisfied A6Ag = Ao+¢. Verify that the same property holds for 


the three rotation matrices 40y, 40y, and Ao, for rotations in R>. 


540 


5. Determine the 3 x 3 matrix that will rotate the point P about the 


line passing through the origin in the direction of U by an angle of 
0, then compute the coordinates of the rotated point: 


(a) P(1,1,1), Y = (3,0,0), 0 = = (b) P(1,0,—1), X = (0,2,0), 0 = 5 


(c) P(1,0,1), Y = (0,0, 1), @ = a (a) P(0,1,1), Y = (1,0,0), 0 = ot 


6. For each of the following, construct a single matrix that will 
satisfy each of the following set of criteria, in the sequence 
specified. 

7 


(a) First rotate about the z-axis by 0 = 3 then scale in the 
x-direction by a factor of 2, and finally, scale in the y-direction 
by a factor of 3. 


(b) First flip about the xy-plane, next rotate about z-axis by 0 = 
m 


4 and finally, scale in the z-direction by a factor of 2. 
(c) First scale in y-direction by a factor of 2, then rotate about 
z-axis by an angle of 0 = 


1 


x-direction by a factor of 2 


ta 


, then finally, scale in the 


7. Describe a procedure for rotating a point P(x, y, z) about the line 


a, ß,0) 


passing through the origin in the direction of U = ( ; 
where a and £ are both nonzero and satisfy a + p Sk 

8. Decompose an arbitrary 3x3 matrix into the product of rotations, 
rescalings, and reflections. 


Mathematica Problems 


1. Homework problem 1 illustrated the fact that the inverse rotation 
matrices can be found by replacing the angle 0 with —0 in the 
original rotation matrix. We zsh to determine if the the same holds 
true for the rotation matrix AU . 
(a) Replace 0 with -0 in sv and perform the matrix 
multiplications 


541 


(a, 8,0) 


(b) Compute Y using the Inverse command, simplify, 
and compare the result to A v ( -0) 
2. Determine the 3 x 3 matrix that will rotate the point P about the 


line passing through the origin in the direction of U by an angle of 
0 


(a) P(2,0,3), X = (2,-1,2) (b) P(2,3,-1), Ÿ = (-1,0,0) 
(c) P(1,0,1), Y = (0,0,1) (d) P(1,—1,0), X = (1,2,1) 


(e) P(0,1,1), X = (1,0,0) (£) P(1,-1,3), Y = (—1, 1,2) 
3. Plot, and animate, an entire rotation of the point P about the line 


in the direction v for each of parts (a)—(f) of problem 2. 


4. Describe and test a procedure for doing rotations about any line 
in space, not necessarily through the origin. 


10.5 Affine Maps 


Closely related to linear maps are affine maps. We begin with a definition. 


Definition 10.5.1. An affine map S: IR" > IR” is a translation of a linear 
—_ 


map, that is, S (wt) = rt) + b for all u ¢ IR”, where T. R” — R 
—> 


™ is a linear map and e R” is fixed. 


If A is the standard m x n matrix representing the linear map T, then A, 


— 


along with the fixed vector b in R”, also represents the affine map S, 


—> 
since we have S (wt) = au + b for all a e IR”. Affine maps 


542 


operate analogously to linear maps, except that they also translate their 


results by . Thus, affine maps will take a subspace K Cc R” and 


— 
return S( K) = L + b in R”, where L 
R” corresponding to the image of the subspace K of R” under the map 
—> 


T. Hence, an affine map S translates the image subspace of T by b 
This should remind you of the work we did back in Section 8.1 on 


— => 


subspace translates. The fixed vector b of R” is 5(.~ T% ), since 


=K K ) is the subspace of 


— — -> — => —> 
(02) =T(02) +b =024+8 =2 
Note that we used the fact that linear maps take the zero vector of their 
domain to the zero vector of their range. We can directly relate solving a 


v 


linear system of equations 4 FA =- , for anm X n matrix A, to 


b 
affine maps. This matrix equation can be written as sa F 


0 


mT. The solutions to this matrix equation are the solutions uw in R” to 
wv). 0 7 a 

S(U) = “mM: which is to say, solutions to T % + = “mM: for 

the linear map T with standard matrix A representing it. We already know 


— 


that the solutions z? in R” to A ? = b are of the form ? = 
=3 -> 
Tk 4 È, where Tk € Ker(T) corresponds to the solutions of A z 
mE 


= Om ' and 7? is a fixed vector in R”, with sÈ =— b 


543 


Example 10.5.1. We next do an example of an affine map. Consider S: R 3_, R: 


wa. b 
given by S =A + , where 


In the following Mathematica commands, we plot (Fig. 10.12) the solutions to S wv = 
— -- 
4 . In other words, we are searching for solutions to a == b 
A= {{-3, 1, 7}, {1, 9, 3}, {-1, 19, 13}, {4, 8, —4}}; 
B = {{8}, {-5}, {-2}, 1333; 
RowReduce[Join[A, B, 2]] // MatrixForm 


O = 
orn © 


0 0 


From the solution given above, we can reconstruct the solution as a line given 
parametrically as 


(10.39) 


15. 11 4 1 
L= } lz,y, 2) | z = Št - —,y = -2t - -2 =t 
(x,y,z) | z 7 g” TaT 


for t an arbitrary parameter. We can decompose L into a sum of two components; the first 
is the line Lo, which passes through the origin; the second is a translation by a particular 


= 


vector P”, To find the particular vector, notice that all we have to do is set t= 0 in the 


544 


z, (-,- 

parametric definition of L given above, which yields P- a 
>? 

that we have Tp ", the line Lo is simply the remaining portion of the solution given in 

equation (10.39): 


(10.40) 


Clearly, Lo is a line though the origin, and is thus a subspace of R: . As discussed 
earlier, the line Z can be realized as a translate of the line Lo by the particular solution 


—+ 


Lp. 
Pp’ Now, let us plot the two lines LZ and Lo along with the particular solution vector 


> 
Lp. 


LineL = ParametricPlot3D[{15/7 t — 11/4, —4/7 t — 1/4, t}, {t, -3, 3}, PlotStyle— 
{Thickness[0.007], Blue}]; 


LineLO = ParametricPlot3D[{15/7 t, —4/71, t}, {t, -3, 3}, PlotStyle 
—{Thickness[0.007], Red}]; 


PtsPlot = Graphics3D[{ Black, Sphere[{0, 0, 0}, 0.2], Black, Sphere[{ —11/4, —1/4, 0}, 
0.213]; 


ArrowPlot = Graphics3D[{Arrowheads[.05], Thickness[.007], Black, Arrow[{{0, 0, 
O}, {-11/4, -1/4, OFF Is 


TxtPlots = Graphics3D[{Black, Text[“O”, {0, 0, 0.5}], Text[“P”, {-11/4, —1/4, 0.5}]}]; 
Show[LineL, LineLO, PtsPlot, ArrowPlot, TxtPlots, PlotRange— All] 


Figure 10.12: The line L can be decomposed into a line through 
the origin and a particular vector solution. 


^n 
c 


545 


Recall that a linear map T: R? | R? is uniquely determined by taking a 
line segment in the domain to another line segment in the range. This is no 
longer the case for an affine map S: R? — R? As it turns out, an affine 
map S: R? — R? is uniquely determined by taking a triangle in the 
domain to another triangle in the range. To see how this is the case, let the 
triangle in the domain be defined as the interior of the three points that are 
the terminal points of the following three vectors: 


Ti = ¢(81,¥1), (2, ya), (z3, ys), 
Similarly for the triangle in the range 
To = {(z1, w), (z2, w2), (23, ws) } 
Then 
S ((25,45)) = (2;,0j), 1S 7 <3 


b b 
Given S P, = ad + with A and defined respectively by 


a b -> Q 

As i= 
c d B 
we have the following system of equations: 


fe alli }+la}-[5]- #3 


This gives the new single matrix equation 


546 


Ti YW 0 0 1 0 a 2) 
0 0 z 1 WW 0 1 b wy 
t: yO 0 1 0 c |_| 2 
0 0 T2 Y2 0 1 d = We 
tz yy 0 0 1 0 a 23 
(10.41) 0 0 T3 Y3 0 1 B Ws 


This last matrix equation can be solved for our six unknowns {a, b, c, d, a, 
£}, which determine the affine map uniquely. 


Example 10.5.2. As an example, we will now find the affine map S that sends the 
triangle Tı to triangle T2 defined by the following sets of points: 


Tı = {(—9, —5), (4, 17), (1, —6) } 
Tə = {(3, 11), (—8, 3), (—2, —15)} 


W = {11, 3, -15}; X = {9, 4, 1}; 
Y = {55, 17, -6}; Z = {3, -8, -2}; 


R= {{Z[N]}, {WII}, {Z(21B, (WIZI, {ZB (WIBI; = CXL, Y0], 0, 
0, 1, 0}, {0, 0, X[[], YIM], 0, 13, {X[[2]], YIZ]], 0, 0, 1, 03, {0, 0, X[[2]], Y[[2]], 0, 13, 
{X[[3]], Y[[3]], 0, 0, 1, 0}, {0, 0, X[[3]], YI[3]], 0, I}}//MatrixForm 


9 -5 0 0 1 0 
0 0 9 -5 0 1 
4 17 0 0 1 0 
0 0 4 17 0 1 
1 -6 0 O 10 
0 0 1-60 1 


(AB = Inverse[d].R) // MatrixForm 


547 


121 
181 

_ 63. 
181 
580 
181 
66 
181 

_ 861 
181 

_ 2899 
181 


(A = {{ABI[1, 1], AB[[2, 1113, {ABI[3, 1]], AB[[4, 1]]}}) // MatrixForm 


(A.£X[[1]], Y[[1]]} + B) / MatrixForm 


(a) 


(A.{X[[2]], Y[[2]]} + B) // MatrixForm 


548 


(A.£X[[3]], Y[[3]]} + B) // MatrixForm 


—2 
—15 


The last three Mathematica commands are simply verifications that the vectors 
Tk Uk ) determining the corners of triangle 71, were sent to their corresponding 
( Zk,Wk ) 
counterparts , of T2. Hence, our formula works. Now you should see 


what it takes to determine an affine map S: R: => R uniquely, and in general when 
the dimensions are not the same for both domain and range. 


We have previously looked at rotations about the origin in IR? and 
rotations about a line through the origin in R°. The next logical question 
to ask is: What happens if we rotate about a point (a, b) in IR? that is not 


the origin, or rotate about a line that does not pass through the origin in R 
3? You should have guessed that we get an affine map for both rotations. 
In Section 8.1, we already expressed this concept in terms of matrix 
multiplication, so it is fairly straightforward to rewrite the ideas in terms 
of maps instead. 


We will once again define Ag to be the 2 x 2 rotation matrix for the fixed 
angle 0 about the origin in R?, defined in the standard way. If we now 


wish to rotate a point corresponding to the terminal end of a vector ua 
about the point (a, b), we need to use the affine map S. Here, S is defined 
as follows: 


S(x) 


Ag (u — (a, b)) + (a,b) 
Agu + $ 


(10.42) 


where 


549 


b = (a,b) — Ag(a, b) 
(I2 — Ag) (a,b) 


(10.43) 


Note that this is similar to the matrix multiplication version from Section 
8.1, where we translated our frame of rotation from the point (a, b) to the 
origin, and then translated back by adding the point (a, b) to the resulting 
rotated point. The same can be done for a rotation about a line L not 
through the origin in IR®. If 49 is the 3 x 3 rotation matrix for the fixed 
angle 0 about the line L’ parallel to L but through the origin, then the 
affine map S will rotate about L any point that is represented by the 


terminal end of a vector WE € R?, and is defined as 
S (a) = Ag (X — (a,b, c)) + (a,b,c) 
an 
(10.44) -= Agil + b 


for 


~ 
Il 


(a,b,c) — Ao (a,b,c) 
= (Iz = Ag) (a, b, c) 
(10.45) 


Here the point (a, b, c) is a fixed point on the line L. 


Example 10.5.3. Now, let us do an example of this three-dimensional rotation about a 
line L that does not pass through the origin. First, we consider the line going through the 


a 
point Pil, 25, =T a is parallel to never AG 2, 9) 
Q (4, -7, 13) 


e 
want to rotate the point about this line through angles 8 that are 


multiples of 10°, until we are back at the point . Each rotation through a fixed angle 
0 is one application of an affine map. The plot of these points, which can be seen in 


550 


Figures 10.13 and 10.14, is essentially the circle of rotation for Q . Referring back to 


Section 4.4, the formula for the rotated point new Will be given by 


Qnew = Ay (Q — P) + P 


(Is-Ay) P+ Ay@ 
(10.46) 


In the Mathematica commands below, the lengthy formula defining A below is simply the 


expression for A+ given in equation (10.38), and the matrix corresponds to the 
column matrix P iving the formula for to now be 
(Is — Ay) Pse Q new 
Grow = AVG +B 


a? + (1 — a?) Cos[6} aß — afCos[@] + 5Sin[6] ad — a5Cos[6] — SSin{[é) 
A= (e — afCos{@} — 5Sin[@} B?” + (1 — 8°”) Cos[0] 85 — B5Cos(8) + aSin[@} ) 
að — adCos(0] + BSin(@] 85 — 85Cos{@] — aSin{6] 8? + (1 — 5) Cos(0} 


Q= {4, -7, 13}; P = {11, 25, -7}; w = {5, 2, 9}; 
v=w/Norm[w]; 

Av =A /. {a—v[[1]], Avi [2]], d= vI]; 

B = (Identity Matrix [3] — Av).P; 

CenterPt = Simplify[(v.Q) v + P] 


365 1498 337 
22’ 55 °110 
LinePlotv = Graphics3D [{Thickness[.004], Red, Line[{—2 v + P, 22 v + P}]}, 


Axes—True]; 


PlotPQCenterPt = Graphics3D[{Blue, Sphere[Q, 1.5], Blue, Sphere[P, 1.5], Black, 
Sphere [CenterPt, 1.5], Black, Sphere[{0,0,0}, 1.5]}]; 


TxtPlots = Graphics3D[{Black, Text[“O”, {0, 0, —4}], Text[“C”, {16, 29, 7}] , 
Text[“P”,{11, 25, —12}], Text[“Q”, {4, —7, 18}]}]; 


Manipulate[AvQB = Evaluate[N[Av.Q + B] /. {0—tval}]; NewPtPlotExp = 
Graphics3D[{Blue, Sphere[AvQB, 1.5]}]; ArrowPlotExp = Graphics3D 
[{Arrowheads[.04], Thickness[.007], Black, Arrow [{CenterPt, AVQB}]}]; 


Show [TxtPlots, LinePlotv, NewPtPlotExp, ArrowPlotExp, PlotPQCenterPt, 
PlotRange— {{—40, 60}, {-20, 65}, {-30, 30}}, Axes— True], {{tval, 0, “0”}, 0, 67, 7t/ 
243] 


551 


Figure 10.13: The point Q (at the tip of the arrow) is rotated 
about the line passing through the point B (the point on the line 


not at the tail of the vector) in the direction of the vector W. The 
origin is also shown for reference and is located directly below the 


point Q and 0 = 5 i 


~ as E 
-20 y 
20 
Q 60 
& 
C 
a 
O d 
20 6 | 
z0 
-20 T 
| 20 
-10 y 


PtsQ = Table[Evaluate[N[Av.Q+B]/.{0—tval}], {tval,0,27,7/24}]; 


PlotAllPtsQ = Table[ Graphics3D[{Blue, Sphere [Evaluate [N[ Av.Q + B] /. 
{0—tval}], 0.5]}], {tval, 0, 27, 2/32}]; 


Show[TxtPlots, LinePlotv, PlotAlIPtsQ, PlotPQCenterPt, PlotRange— {{—40,60}, 
{-20,65}, {-30,30}}, Axes—True, Aspect-Ratio— 1] 


Figure 10.14: All of the rotated points computed in the 
Manipulation procedure given before Figure 10.13. 


552 


-20 20 


r Yo o w 
| 4 a © *. 
a O % 
D ad ` 
e P e 
Z 0: g e 
"i YS 

. Ct icae Pl ig 
-20 ; 

~io 20 50 


We end this section with a Mathematica module called 
RotationsInSpace, which takes a nonzero vector 7 = (a, B, 6) 


parallel to the line Z of rotation where the line of rotation goes through the 
point P (which could be the origin) but L is not parallel to any axis or 


coordinate plane (no component of Vv can be 0), and then rotates a point 


G through the angle 8 about L. The vector Vv and two points B and G 


can be given as lists, although Vv as a vector also works: 


RotationsInSpace [v_, P_, Q_, 0_]:= Module [{w, a, p, 6, Av, B}, 


w = v/Norm[v]; 
a = w/([1}}; 8 = w{[2)]; 6 = wi[3}); 

( a? + (1 =a?) Cos[6} ap — aSCos[6} + Sin[0] ad — adCos[0] — Sinf) 
A= 


af — a8Cos{6] — Sin] Ø” + (1 — 87) Cos[6] Bë — BőCos[ð)] + aSin[é] ) 
ad — aSCos[0} + BSin[@] 845 — 85Cos[6] — aSin{@] 5? + (1 — 5?) Costa) 


Av = A /. {a-—>w|[1]], 8-—>w{[2]], 6-—+w/[[3]]}; 


553 


B=P-AP; 
Return [A.Q + B]; 
Q1 = N[RotationsInSpace[{5,2,9}, {11,25,—7}, {4,-7,13}, 2/3]] 
{-17.7428, 23.1956, 18.3692} 
v= {5, 2, 9}; 
w = v/Norm|[v]; 
P= {11, 25, -7}; Q = {4, -7, 13}; 
Cntr = Simplify[(w.Q) w + P] 
365 1498 337 
N[Norm[Q —- Cntr]] 
37.8073 
N[Norm[Q1 — Cntr]] 
37.8073 


Homework Problems 


1. Find the solutions to the equations S(T) = if for each of the 


following affine maps: 


554 


1-2 2 “1 
0 3 2 2 35 1 = 

©] -2 -4 -1 |+|- of 1 a|- i| 
4 5 6 3 


2. State the dimension for each of the solutions to the affine maps 
from problem 1. 

3. How many points in IR” does it take to define an affine 
transformation S: R” > IR” uniquely? 


4. Referring to the matrix equation (10.41), construct the matrix 
used to determine the number of vectors needed to create a unique 


affine map from IR? to R? of the form 


a b Q 
S(@)=|c d|t+| B 
e f ai 


5. Determine the number of vectors needed in IR” and R”, with n, 
m > 1, which will determine an affine map S: R” > IR” uniquely. 
It may help to construct a generic matrix, as in problem 4 and 
equation (10.41). 

6. Express, as an affine map, the rotation of an arbitrary vector 
q €e R? through an angle 0 about the point (1, —1). 


7. Can a single affine transformation S: R? — R? send the interior 
and sides of the unit square to the interior and sides of an arbitrary 
quadrilateral in R% rf yes, explain why. If no, can it be done for 
any special type of quadrilateral? 


555 


Mathematica Problems 


1. Construct an affine map S that takes the first set of points S to the 
second set of points T, with §(3f) = i 


(a) S = {3 = (1,—1,1), 3 = (0,2, -3), 33 = (2,4, —-3), 3? = (10, - 3,2)} 
T= {= (1,2), 2 = (4,2), t = (4,5), etn} 

(b) S= {x = (1,2), Z = (-2,3), 3 = (5, -1)} 
T= {i = (1,2,-1), È =(7,8,-2), 6 = (3, -5,6)} 

(c) S= {si = (1,1,3), 32 = (-2,0,5), 33 = (4,3, -3), 3 = (7, -6,8)} 
T= {z = (1,2, —2,1), & = (3,5,6,7), ts = (—4, —6,9,0}, 


% = (-3,-2, 5,-3)} 


2. Construct a family of affine maps that takes the first set of points 
S to the second set of points T. 


a a CaS 9 
T= {7 = (3,-8, -9,1), & = (4,5, 5,7, -6)} 
i 


(b) S = {3% = (—4,5,0,3), 3 = (0,3, -2,1), 3 = (2,5,8,-1)} 
T= {i = {1,6,3), t = (—4,3,7), & = (2,3, -2)} 


3. Rotate the point (—3, 4) about the point (1, 1). Perform a complete 
rotation, including at least 10 frames in your animation. 

4. Rotate the point (—10, —9, 9) about the line that passes through the 
point (1, 1, 1) and points in the direction (2, —2, 4). Perform a 
complete rotation, including at least 10 frames in your animation. 

5. Rotate a sphere of radius 2, with center (3, 3, 0), about the line 
that passes through the point (—1, —1, 1), and points in the direction 
perpendicular to the line that passes through the center of the sphere 
and the point (-1, —1, 1). Perform a complete rotation, including at 
least 10 frames in your animation. 


556 


Research Projects 


1. An interesting application of affine maps has been used by 
Michael Barnsley to construct geometric objects called fractals. 
Research the topic of fractals in Barnsley’s book Fractals 
Everywhere [2], and write Mathematica code to generate his Fern 
fractal example. You might also look at the book Elementary Linear 
Algebra with Applications, 7th edition (or later) by Howard Anton 
and Chris Rorres [3], for some information on fractals. 


557 


Chapter 11 


Least-Squares Fits and 
Pseudoinverses 


11.1 Pseudoinverse to a 
Nonsquare Matrix and 
Almost Solving an 
Overdetermined Linear 
System 


Finding the inverse to a matrix A so far has been restricted to A being 
square. The question is: Why? If you go back to Section 5.1, you will 
notice the relationship expressed between a matrix’s inverse and 
determinant. The determinant is a construct that is realizable only when 
dealing with a matrix of square dimension, and thus, the same must be 
true of the inverse to a matrix. Now we ask the question: If A is a 
nonsquare m X n matrix, is it possible to find some matrix B that, in some 
way, acts like an inverse matrix to A? It is by no means obvious that such 
a matrix B should exist, or even how it might behave. Instead, we start 
with something a bit more obvious. Once again, under the assumption A € 
R””" notice that A A? e R™™ and AT A e R™” These two square 
matrices might have inverses. The issue becomes whether one is preferred 
over the other. One of the main reasons for finding an inverse to a matrix 


558 


is to solve a system of equations. So let us consider a linear problem of the 
form 


+> 
aia A? = b 


where , A e R™” and T c R” and ? c R””! Here we consider 
vectors of R” and R” to be matrices with only one column, as is the 
standard interpretation. This will make it easier to determine the proper 
choice for constructing a square system. By examining dimensions of the 


matrices, notice that we can actually compute AT b. This implies that we 
should consider the following new system: 


— 
(11.2) ATAY = AT b 


This is now a square system, where we have replaced the unknown 


solution vector ?, from the original nonsquare system, with 7, for the 
unknown solution to the new square system. It may seem counterintuitive, 
but the two systems may not have the same solution sets, which is why we 
have changed solution variables when going from the nonsquare system to 
the square system. The main reason why this is so is that one cannot do 
any matrix algebra, in terms of matrix multiplication, to arrive back at the 
original system. However, it should be reasonable to assume that each 


solution z of the original system is a solution to the new system. To see 
this, consider a specific solution Tp to (11.1). If we left-multiply both 
sides by AT, then the equation still holds; hence Tp also satisfies (11.2). 
Unfortunately, as stated previously, if Up is a solnon to (11.2), there is 


no way, without manual checking, to very that YP would also be a 
solution to (11.1), as we cannot compute (A y when A is not a square 
matrix. 


Once we have left-multiplied both sides of the nonsquare system by AT to 
make the system square, we can attempt to find a solution by standard 
methods. For instance, if ATA has an inverse, then we arrive at the single 
solution 


7 =(ATA) ATT 


559 


This solution v suggests the following definition. 


Definition 11.1.1. The pseudoinverse of a matrix A e R”, denoted by 
p(A), is defined to be 


p(A) = (ATA)! AT 


One should immediately notice that the concept of a pseudoinverse 
reduces to the standard definition of invertibility if A is square and 
invertible. To see this, remember that (4B)! = BIA, and hence 


p(A) = (ATA)! AT 
= Aq! (AT)™ 
=A-!] 
=A! 


g” 


As a result, the idea of a pseudoinverse generalizes the process of solving 
square systems of equations to solving nonsquare systems of the form 
(11.1). In the case of a square system where Æ is invertible, the 
pseudoinverse approach and the matrix inverse approach would yield the 
exact same solution set. 


Now that we have defined a pseudoinverse, and we have a process for 
computing it, we must ask ourselves whether this process can be applied 
to find the pseudoinverses of any matrix. If we consider a matrix A € R 
m*n with m < n, then with a little work, we can show that det (474) = 0, 
and hence that p(A) does not exist. Therefore, pseudoinverses can exist 
only if m > n, as we shall see by way of the following examples. We will 
first attempt to take the pseudoinverse of arbitrary matrices of dimension 1 
x 3,2 x 3,3 x 4, and 2 x 4. To determine whether the pseudoinverse can 
be taken, we will have Mathematica compute the determinant of the 
matrix A! A. Note the end result in each case: 


(A = {{a, b, c}}) // MatrixForm 
(a bc) 


560 


Det[Transpose[A].A] 
0 
(A = {{a, b, c}, {d, e, f}}) // MatrixForm 


a be 
de f 
Det[Transpose[A].A] 


0 


= {{a, b, c, d}, {e, f, g, h}, {i, j, k, 1}}) // MatrixForm 


a bed 
e f gh 
Hog de f 


Det[Transpose[A].A] 


0 


= {{a, b, c, d}, {e, f, g, h}}) // MatrixForm 


a bed 
e fgh 


Det[Transpose[A].A] 
0 


Clearly, the determinant for each of the four matrices given above was 0, 
and in each case, m < n. You should ask yourself why this will always be 
true when m <n 


561 


Example 11.1.1. Next, we put our pseudoinverse to work for us on problems where we 
can actually apply it. Consider the following problem, expressed in matrix form as 
follows: 


-9 —l1 


This problem is in the form of equation (11.1), and since A is not square, we cannot 
simply multiply both sides by At, Obviously, we can still solve this system of equations 
by augmenting the two matrices and row reducing. However, it should be of interest to 
try this new method we have discussed as well. 


A = {{—5, 2, —9}, {7, 1, 3}, {8, —4, 0}, {—6, 10, —3}}; 
(pseudoA = Inverse[Transpose[A].A].Transpose[A]) // MatrixForm 


1609 6404 2116 1577 
78105 78105 26035 78105 
143 1091 32 1520 
15621 15621 5207 15621 
26372 1762 5168 751 


~ 234315 ~ 234315 ~ 78105 — 234315 


B= {{—11}, {6}, {1}, {5}}5 
(Y = pseudoA.B) // MatrixForm 


34958 
78105 
15815 
15621 
260261 
234315 


RowReduce[Join{A, B, 2]] // MatrixForm 


0 
0 
1 
0 


oo o 
oor © 
= OOGO 


2-0 
This row-reduced matrix tells us that the system A T = b has no solution, since its last 


row is the equation 0 = 1. If you recall, a system ae = b , where A is m x n with m > 


562 


n, is said to be overdetermined, because it has more equations than variables. Such 
systems invariably have no solutions, as we have just seen. Unfortunately, 
overdetermined linear systems occur very often in real life, as data can be sampled 
frequently, but rarely will always lie on the graph of a function for which we wish to 
model the data. Thus, we need an approximate solution to such a system if no actual 
solution exists. 


N[A.Y] // MatrixForm 


— 10.2096 
7.47765 
—0.469061 
4.10654 


N[(A.Y — B)] // MatrixForm 


0.790372 
1.47765 
— 1.46906 
—0.893464 


N[(Norm[A.Y — B]] 


2.40095 


<> 
Notice that if v is the unique solution to the new system (11.2), with y =p(A) b , then 


Vo v ? 
AP zx b . Therefore, is approximately a solution to the original system A = b y 
even though the system se = b has no actual solutions. As seen below, if we take the 
four 3 x 3 square subsystems of se z b , (i.e., remove one of the equations or 


datapoints), then our solution is close to the unique solutions of all four subsystems. 


Ay - b| 
To determine how close, we simply evaluate , as was done in the last 
Norm command in the section of Mathematica code above 


563 


(Solni = RowReduce[Drop{Join[A, B, 2], {1}, {}]][[All, 4]]) // Ma- 
trixForm 


55 
92 
87 
92 
20 
69 


N[(Y — Soln1)] // MatrixForm 
—0.150249 
0.066767 
0.820876 


(Soln2 = RowReduce[Drop[Join[{A, B, 2], {2}, {}]][[All, 4]]) // Ma- 


trixForm 


33 
43 
221 
172 
93 
86 


N[(Y — Soln2)] // MatrixForm 


—0.319865 
—0.272465 
0.0293359 


(Soln3 = RowReduce[Drop{Join[A, B, 2], {3}, {}]][[A, 4]]) // Ma- 
trixForm 


29} 
171 
169 
171 

37 

27 


N[(Y — Soln3)] // MatrixForm 


0.318922 
0.0241151 


—0.259639 


564 


(Soln4 = RowReduce[Drop[Join[A, B, 2], {4}, {}]][[Al, 4]]) // Ma- 
trixForm 


N[(Y — Soln4)] // MatrixForm 


0.130269 
0.627804 
—0.020679 


So the solution y to the square system is close to the four solutions to each 3 x 3 
subsystem of the original nonsquare system. Now let’s plot the four planes that are the 


four equations of the original system with the solution # . We will let the variables in the 
system be x, y, and z, so that each of the four equations of this system is a plane in 


three-space. We will then plot these four planes with the solution to the new system 


— 
ATAY = AT b . This is depicted in Figure 11.1. 


X = {x, y, z}; 
(Planes = A.X — B) // MatrixForm 


11-—5x+2y—9z 

—6+7x+y+3z 
—14+8x-dy 

—5 —6x+10y —3z 


PlanePlots = ContourPlot3D|[Evaluate[Flatten[Planes]], {x, —4, 4}, 
{y, —4, 4}, {z, —4, 4}, Mesh—None, ContourStyle+{ {Red}, {Blue}, 
{Yellow}, {Brown}}}; 


YPlot = Graphics3D[{White, Sphere[Flatten[Y], 0.6}}]; 
Show[PlanePlots, YPlot] 


Figure 11.1: The system of planes and the solution v (sphere) that 
best approximates the intersection of these planes. 


565 


3 
The sphere of radius 5 in this plot has a center at the solution 7 Notice that y is the 
point in three-space corresponding to the location of closest intersection of all four 
planes. In Section 11.3, we will explore in greater detail the reasons why the 
pseudoinverse solution appears to be the best approximate solution to overdetermined 
systems. 


Pay special attention to how the third plane was defined above. The equation in question 
is given by 8&x—4y-1 = 0. Notice that the other three planes are solved for in terms of z, yet 
this plane has no z variable in it. Hence, the plane 8x—4y—1 = 0 corresponds to a plane 
parallel to the line 8x—4y = 1 in the xy-plane, independent of z, and sticks out of the 
xy-plane in a perpendicular fashion. To graph this, we could have solved for another 
variable instead, or we can graph it parametrically. Solving for y in terms of x gives y = 


2x — 4, and hence, if we let x = t the line y = 2x — 4 is given parametrically by 


(x(t), y(t)) = (t, 2- ,teR 


Since this is a plane, we need our third coordinate z to be represented somehow. In this 


case, it does not depend on x or y, and from previous work, we know that surfaces in R: 
must be defined parametrically as a function of two variables. We will call this second 
variable s, so our final parameterization of the plane 8x — 4y — 1 = 0 is given by 


566 


(x(s,t), y(s,t), z(s,t)) = (+ 2t — > s) , S tER 


This parameterization can be found in the definition of plane3 above. 


We end this section with an example that shows how the pseudoinverse 
method can be used to find solutions to overdetermined systems. If you 
recall, overdetermined systems rarely have solutions. For instance, three 
lines in the xy-plane most likely will not intersect one another at the same 
point. However, if they did, would the pseudoinverse method give us this 
answer? 


Example 11.1.2. Consider the following system of equations 
22 —y=-7 
—5z — 2y = 13 


3z + 3y = —6 
(11.3) 


> 
which in matrix form sa = b is given by 
2 —1 m -7 
5 -2||%|=| 13 
3 3 —6 


The solution to system (11.3) is {x =—3, y = 1}. Now we shall have Mathematica solve 
this overdetermined system, yielding the correct solution: 


Eqns = {2 x — y == —7, -5 x — 2 y == 13, 3 x +3 y == —6}; 
Solve[Eqns, {x, y}] 
{{x > -3, y > 1}} 


A= {{2, —1}, {-5, —2}, {3, 3}}; n {{—7}, {13}, {—6}}; 
(pseudoA = Inverse[Transpose[A].A].Transpose[A]) // MatrixForm 


567 


(IntersectPt = pseudoA.B) // MatrixForm 


From this output, we see that mathematica arrived at the correct solution via the 


pseudoinverse method.Now, let us plot this system of lines and the solution so that we get 
a geometric interpretation of the result given above (see Fig. 11.2). 


Figure 11.2: Overdetermined system (11.3) consists of three lines 
in the xy-plane that have a simultaneous solution (—3, 1) found by 
the pseudoinverse method. 


IntersectPlot = Graphics{{Black, Thickness[0.005], Circle[Flatten[ 
IntersectPt], 0.1}}); 

LinePlots = ContourPlot{[Evaluate[Eqns], {x,—5,0}, {y,—2,3}, Con- 
tourStyle— {{Red,Thickness({0.01]}, {Black,Thickness[0.01]}, {Blue, 
Thickness[0.01]}}, PlotRange— All, Axes— True, Frame— False]; 
Show[LinePlots, IntersectPlot] 


Homework Problems 


1. IfA e R”*", what are the dimensions of p(A)? 


2. Compute the pseudoinverse of each of the following matrices (if 
no pseudoinverse exists, then state so): 


568 


1; -1 2 4 
: ea | 
o[$ 4] o[$4] of teal 
F ` ; 1 nit 3 —2+i 1-2i 
(g) 03 4 (h) 0 3 (i) i -2 — Îi 
3 d =j 4 g 3i 0 


3. For each of the following nonsquare matrices, perform a row 
reduction on the augmented matrix (4 | b) and determine if the 
system has a solution 


ola 5 


3-i 4 —2i 7 
(e)| 4-2: 7+6i =| 3 
-5+3i -10—14i 

3-i 4—2i 7 
4—2i 74+6i =| 3 
—5+4+3i —10-— 14i 


(£) 


4. Approximately solve each of the nonsquare systems from 
problem 3 using the pseudoinverse method. 


5. For each of the solutions v found in problem 4, compute av and 
> 


compare it to nM by evaluating |A 7 — 6 : 

6. Let A € RI” for n> 1. For n = 2, 3, find AT A and det(A? A). 

7. Give an argument, based on bases and dimension, as to why no 
pseudoinverse can exist for A e R™” with m <n. 

8. Verify the following properties of the pseudoinverse. Here, you 
may assume that A is an arbitrary m x n matrix, and that c € R. 


569 


(a) Ap(A)A=A 


9. Let A e R””! for m > 1. Find the formula for the pseudoinverse 


P(A) 


10. Is p(AB) = p(B) p(A)? If yes, explain why; if no, give an 


example of its failure. 


11. Is pA = p(A)"? If yes, explain why; if no, give an example of 


its failure. 


(b) p(p(A)) =A (c) 


Mathematica Problems 


1. Compute the pseudoinverse of each of the following matrices (if 


no pseudoinverse exists, then state so: 


28 4 
20 1 
()) 45 9 
| 
4-8 32 
(c)| 0 -5 6 2 
k Sea 9 
2 1 2 -8 
-3 4-8 9 
Sls t 2d 
8 -2 10 0 


2. For each of the following nonsquare systems, perform a row 
reduction on the augmented matrix (A|b) and determine if the 


system has a solution 


=a 


2 l1—i 
-5 


HE. 


570 


p(cA) = ~ p(A) 


8 4 
0 1 
5 2 
1 —4 
2 2 
-í 1 5 
2 —4 3 
—6 5 9 
4 -5 3 
2 4 -9 
2 12 —4 
—l —2+1 
2— 5i 3 
1 1+ 5% 
0 3+ 2% 


3 4 3 4 >? 1 2 1 
6 40 mt 6 40 -3 
afa 1 ,/%=] 9 b 1il z= 2 
6 44 3 6 44 2 
J =2i I 0 2 
3 2 =] 5 -3 
(c)| 7 2 -4 6)/7=] 4 
4-5 6 2 2 
3 0 11 -10 -2 
3 -2 1 0 2 
3 5 -20 14 6 
(d)| 7 2 -4 6| z= 4 
4 -5 6 2 2 
3: 0 11 —10 ~9 


3. Approximately solve each of the nonsquare systems from 
problem 2 using the pseudoinverse method. 


4. For each of the solutions v found in problem 3, compute av and 
> 
=F | 


5. Create an overdetermined linear system of planes that intersect in 
a common line. Find the pseudoinverse approximate solution for 
this system and determine its distance from this line of intersection. 


> - 
compare it to b by evaluating |A 7 


6. Create an overdetermined linear system whose solution set S has 
dimension greater than one; hence an infinite number of solutions to 
the system exist. Find the pseudoinverse approximate solution for 
this system and determine its distance from this solution S. 


7. Let A e R?” For n = 3,4, find ATA and det(A" 4). 


11.2 Fits and Pseudoinverses 


In Section 11.1, we found a way to obtain an approximate solution to a 
system of equations when there were more equations than unknowns. It 
should be clear that in this situation, the chances of finding an exact 
solution are small. Normally, one needs the same number of linearly 
independent equations as unknowns to find a unique solution. If a row 
reduction is performed on the augmented matrix corresponding to one of 


571 


these overdetermined systems, one usually arrives at an equation of the 
form 0 = 1. Any rref matrix with a row corresponding to the equation 0 = 
1 tells us that no solution exists to the system. This is an important piece 
of information, but it tells us nothing about how close we can come to 
finding an approximate solution to the system of equations. The 
pseudoinverse method allows us to find an approximate solution when no 
exact solution exists, and find exact solutions in the rare instances when 
exact solutions exist to overdetermined systems. 


Consider a set S, consisting of at least three points in the xy-plane: 
S= {(z1;, y1) , (£2, Y2) TE (Zn, Yn) } 


If we plot these points, and it appears that they lie fairly close to being on 
the graph of a line L: y = ax + b, for some unknowns a and b, would it not 
be nice to be able to determine the equation of this line? This line L is 
called a line of fit to the data. What properties should this line of fit 
satisfy? We would hope that if we plug each point of the set S into the 
equation defined by L, then yj = ax; + b would be approximately true. 
Notice that with n points, we end up with n equations with only two 
unknowns, given by 


atı +b=y, 


ar2+b=y2 


(11.4) Tn b= Yn 


We already know that such a system most likely has no solution, but we 


can use the pseudoinverse to get an approximate solution 7 We must first 
set up this overdetermined system. To do this, notice that in matrix form, 
each equation from (11.4) can be expressed in terms of matrix 
multiplication as 


[ zj IHE 


572 


Using this for all n points, we can now express the complete system of 
equations as the matrix equation given by 


zı 1 yı 

z2 1 a y2 
E iiiar 

In 1 Yn 


(11.5) 

. . . = . 
This equation is of the form A? = b , which we can convert to the square 
form given in (11.2). An approximate solution can now be found, and is 
given by v = p(A)b , assuming that the pseudoinverse of A exists. This 


solution, Y , produces a line L of fit to this set. Notice that in equation 
= 
(11.5), A € R2 7 c Randb eR. 
Example 11.2.1. As an example, consider the following situation. A bookstore has the 


following demand data for each week of the first quarter that are the months January, 
February, and March: 


week books sold average price ($) 
l 415 13.25 
2 372 13.85 
3 391 13.60 
4 428 13.15 
5 403 13.65 
6 350 14.05 
T 362 13.90 
8 410 13.20 
9 385 13.60 
10 434 13.00 
11 465 12.75 
O12 380 13.65 


These data can be interpreted as points of the form (x,p), where x is the number of books 
sold that week and p is their average selling price, so our dataset will have twelve points: 


573 


S = {(415, $13.25), (372, $13.85), (391, $13.60), (428, $13.15) 
(403, $13.65), (350, $14.05), (362, $13.90), (410, $13.20) 
(385, $13.60), (434, $13.00), (465, $12.75), (380, $13.65) } 


Now that we have our data, we can plot them (Fig. 11.3), along with the line of “best” fit, 
which can be found by the pseudoinverse method. We will have Mathematica perform all 
the necessary computations for us: 


Figure 11.3: The dataset representing the number of books sold 
versus the average selling price. 


S = {{415, 13.25}, {372, 13.85}, {391, 13.60}, {428, 13.15}, {403, 
13.65}, {350, 14.05}, {362, 13.90}, {410, 13.20}, {385, 13.60}, {434, 
13.00}, {465, 12.75}, {380, 13.65}}; 


PlotS = ListPlot[S, PlotMarkers—+{"+", Medium}, Axes— True, 
AxesOrigin—+{300, 12}] 


x books sold at average price $p 

$p 

14 + ty 
+44 + 
++ 
13 ti 
+ 

12 

to 1 a 7 

300 350 400 450 


(A = Join[Drop[S, {}, {2}], Table[{1}, {i, 1, 12}], 2]) // MatrixForm 


s 
pei pb pd pd pd jd jd pb jd jd jd Șab 


574 


B = Drop{S, {}, {1}; 
PseudoInv[z_] := Inverse[Transpose[{z].z].Transpose[z]; 


Y = Flatten[PseudoInv[A].B] 
{—0.0117255, 18.1561} 


So our line of best fit to these data using the pseudoinverse is 


(11.6) p(x) = —0.01172552 + 18.1561 


where p is used instead of y, since this is really the average price formula per week for p 
in terms of x, which is how many books are sold that week. Now we want to plot it with 
the data (see Fig. 11.4). 


LinePlot = Plot[Y[[1]] x + Y{[2]], {x, 300, 500}, PlotStyle—- Red]; 
Show[PlotS, LinePlot, PlotRange— All] 


Figure 11.4: The dataset and the line of best fit L. 


x books sold at average price $p 


$p 
14 ee 
+. 
a 
oe 
13 | Tp 
+ 
12$ 
rol S e 4 ee 4 da x 
300 350 400 450 500 


On inspection of Figure 11.4, we see that the line determined by the pseudoinverse seems 
an excellent fit to the data, but is it the best fit to the data? What should “best fit” even 
mean? Once again, we will address these questions in Section 11.3, which discusses 
least-squares fits. 


Now what do you do if the dataset does not seem to be fit by a line, but a 


curve of some type, such as a polynomial, or a sum of sines and cosines? 
Now let us look at an example of each of these kinds of fit. 


575 


Example 11.2.2. Let us take an approximate dataset S of datapoints (t, y), where t is the 
time in seconds after the launch of a rocket, and y is its height in feet above ground level 
at time t: 


S = {(0,925), (2, 1005), (4, 1135), (6, 1215), (8, 1345), 
(10, 1265), (12, 1130), (14, 975), (16, 795)} 


First, we want to plot these data, and recognize what type of curve would best fit them. 
The data are plotted in Figure 11.5. 


S = {{0, 925}, {2, 1005}, {4, 1135}, {6, 1215}, {8, 1345}, {10, 1265}, 
{12, 1130}, {14, 975}, {16, 795}}; 


PlotS = ListPlot{S, PlotMarkers—{"+", Medium}, Axes— True, 
AxesOrigin-+{—1, 790}] 


Figure 11.5: The data measured during a rocket flight; a parabola 
would fit these data more accurately than a line. 


Height y feet of projectile after t seconds 


y 
+ 
1300 | 
Ea 
+ 
1100 | ii T 
t + 

900} + 

jg = Re E E, 

0 5 10 15 


Figure 11.5 looks roughly like the plot of a standard parabola opening downward, which 
has the form y = at’ + bt + c. Let us use the pseudoinverse to find a parabola that fits 
these data and plot the dataset with this parabola. To do this, we must first understand 
how to modify the linear fit problem from the previous example. The standard form of a 
parabola has three unknown constants a, b, and c. Our hope is that each point (tk, yk) 


— gts 
satisfies Yk = atk T bts + c As with the linear case, this is most likely not 
possible. The best we can hope for is an approximate solution. So, similar to the 


—— ite 
line-of-fit problem, we can express Vk = atk T bts, +e in matrix form as 


follows: 


576 


a 
[ #2 tk 1] 6 | = 
c 


We can now construct a matrix system of the form (11.1) given by 


t? w 1 
2 yı 
t5 t2 1 a Y2 
bl= 
v é : 
Ys 
ti ty 1 


Compare this matrix equation to that of (11.5) and see if you can generalize this process 
to a function with k unknown constants and n known points. Back to our specific 


example, we have nine points and three unknown constants in our parabola, so A € R 


> 
aS T e€ R: and b G R Our parabola of fit should be given by the solution = 
> 
P(A) b , which we shall find in the commands below. Pay special attention to how we 
define the matrix A: 


(A = Table[{S[{i, 1}]?, S[fi, 1]], 1}, {i, 1, 9}]) // MatrixForm 

0 0O 1 
4 2 1 
16 4 1 
36 6 1 
64 8 1 
100 10 1 
144 12 1 
196 14 1 

1 


577 


(B = Drop{S, {}, {1}]) // MatrixForm 


925 
1005 
1135 
1215 
1345 
1265 
1130 
975 

795 


Y = Flatten[N[PseudoInv([A].B}] 
{—6.80736, 104.168, 871.636} 


So now we have that the parabola of fit to these data set has equation 


y = —6.80736t? + 104.168t + 871.636 
Let us plot this parabola (Fig. 11.6), along with the dataset: 


Figure 11.6: Rocket trajectory dataset and the parabola 
approximation. 


ParabolaPlot = Plot [Y{(1]] t? + Y[[2]] t + Y([3]], {t, 0, 22}, PlotStyle 
Red]; 
Show[PlotS, ParabolaPlot, AxesOrigin-+{0, 0}, PlotRange—All] 


Height y feet of projectile after t seconds 
y 


xa 
| ee 
1200 pe: F Se, 
~ N 
200} M 
\ 
400 | \ 
4 + E t 
0 5 10 15 20 \ 


578 


From Figure 11.6, we see that the fit of this parabola is fairly good and it allows us to say 
that the rocket will hit a target on ground level at roughly 21 seconds after launch. 


Example 11.2.3. Now let us take a dataset of stock market data for the price p of a 
particular stock at half-hour intervals for the first 6 hours after the market has opened, 
that is, from 9:00 A.M. to 3:00 P.M. The data are given as points (t, v), where v is the 
stock value at time ¢ hours from the opening of the market: 


S = {(0, $17.83), (.5, $17.71), (1, $17.56), (1.5, $17.60), (2, $17.65), 
(2.5, $17.55), (3, $17.52), (3.5, $17.58), (4, $17.60), (4.5, $17.47), 
(5, $17.44), (5.5, $17.61), (6, $17.69)} 


First, we plot the data to determine what type of function, or set of functions, would best 
approximate them (see Fig. 11.7). 


S = {{0, 17.83}, {.5, 17.71}, {1, 17.56}, {1.5, 17.60}, {2, 17.65}, 
{2.5, 17.55}, {3, 17.52}, {3.5, 17.58}, {4, 17.60}, {4.5, 17.47}, {5, 
17.44}, {5.5, 17.61}, {6, 17.69}}; 


PlotS = ListPlot[S, PlotMarkers—+{"+", Medium}, Axes— True, 
AxesOrigin—+{—0.2, 17.4} 


Figure 11.7: Plot of stock market data. 


Stock price $p at t hours after opening bell 


$p 

4 
17.8 
17.7 + + 

_ 
17.6 + 
+ ja # 
+ + 
17.5 | T 
+ 
+ 
L —— t 
0 2 4 6 


This pattern is similar to a wave pattern that suggests the use of trigonometric functions, 
such as sine and cosine. Let us try approximating it on the interval [0, 7] since this covers 


s? rt 
sin (27t) cos (27) 
the hours of operation of the stock market. Now both and T 
have period 7 instead of 27. So we shall use these functions with positive integer 


579 


multiples of these angles, as well as the constant function 1 to build our approximating 
function. Thus, we shall approximate this dataset by a function of the form 


2Qrt Ant önt 
v(t) = ao + a; cos | — } +azcos | — } + ag cos | — 
7 7 7 
+ ag sin (=) + assin (=) + ag sin (=) 
A 7 s 7 9 7 


Note that we could have chosen a different function, perhaps with more sine and cosine 
functions, but the function above is a good first guess. On inspection of Figure 11.8, we 
see that our guess is sufficient. Back to the problem at hand, our overdetermined linear 
system can be expressed in matrix form (11.1), with 
Qnty Ant; Grt; : 2nt) 
1 cos (74) cos(“4) cos(*F+) sin (474) ) 
2nt, Ant rt * 2nt * Art. * 
1 cos (742) cos (44%) cos (2) sin (272) sin(%) sin( 
A= 


1 cos (2882) cos (#58) cos (sfa) sin (2881) sin (48) sin (ezta) 


(A = Chop[N[Table[{1,Cos[27/7 S[fi,1]]],Cos[4x/7 S[[i,1]]],Cos[6x/7 
S[[i,1]]], Sin[2x/7 S[fi,1]]], Sin[4x/7 S[[i,1]]], Sin{6x/7 S[fi,1]}}}, {i, 1, 
13}}]]) // MatrixForm 


1. 1; Í; T; 0. 0. 0. 
1. 0.900969 0.62349 0.222521 0.433884 0.781831 0.974928 
1. 0.62349 —0.222521 —0.900969 0.781831 0.974928 0.433884 
1. 0.222521 —0.900969 —0.62349 0.974928 0.433884 —0.781831 
1. —0.222521 —0.900969 0.62349 0.974928 —0.433884 —0.781831 
1. —0.62349 —0.222521 0.900969 0.781831 —0.974928 0.433884 
1. —0.900969 0.62349 —0.222521 0.433884 —0.781831 0.974928 
1. =h 1. =1. 0 0 0 

1. —0.900969 0.62349 —0.222521 —0.433884 0.781831 -—0.974928 
1. —0.62349 —0.222521 0.900969 —0.781831 0.974928 —0.433884 
1. —0.222521 —0.900969 0.62349 —0.974928 0.433884 0.781831 
1. 0.222521 —0.900969 —0.62349 —0.974928 —0.433884 0.781831 
1. 0.62349 —0,222521 —0.900969 —0.781831 —0.974928 —0.433884 


B = Drop{S, {}, {1}]; 
Y = Flatten[N[PseudoInv[A].B]] 


{17.62, 0.119301, 0.0737797, 0.0100651, 0.00735615, —0.0518886, —0.0589496} 


580 


SinCosPlot = Plot[{Y[[1]] + Y[[2]] Cos[2x/7 t] + Y{[3]] Cos[4m/7 t] + 
Y[[4]] Cos[6x/7 t ] + Y[[5]] Sin[2m/7 t] + Y[[6]] Sin[4x/7 t] + Y[[7]] 
Sin[67/7 t], {t, 0, 7}, PlotStyle+Red]; 

Show[PlotS, SinCosPlot] 


Figure 11.8: Plot of stock market data for the price of a certain 
stock over the course of the trading day and the trigonometric fit 


function 


Stock price $p at t hours after opening bell 


We seem to have a fairly decent approximation using these trigonometric 
functions, and the approximation can be improved by adding more trig 
functions to those already used. The question becomes: What types of 
functions can we use in our approximation? Similar to the case of vectors 


in R” or C” we would like the set of functions to be linearly 
independent. But what does it mean for a set of functions to be linearly 
independent? 


Definition 11.2.1. Given a set of n functions {f\(x), P(x), ... fn(x)} on an 
interval J, where each fk is at least n — 1 times differentiable, the 
Wronskian W is a function of x e J defined by 


581 


fi(z) falz) >> fala) 

file) fæ) = fala) 
W (fi, fas- --» fa) (2) = det ‘ee 

Pe) A (a) SEa) 


Simply stated, the Wronskian of the set of functions {f1(x), RO), ... fn(x)} 
is the determinant of the matrix whose kth column consists of the first n — 
1 derivatives, sequentially, of fk(x). 


Definition 11.2.2. A set of functions {fi (x), P(x), ...fn(x)} on an interval 7 
= [a, b] is said to be a dependent set of functions if for all x in the interval 7 
there is at least one function f(x) such that fj(x) is a linear combination of 
the remaining functions of the set; that is, there are real scalars a1, a2, ... 
@j-1, @j+1, ... an such that for all x € J, 


f(z) = a fi (Z) + aofo(x) + +--+ anfn(z) 


Definition 11.2.3. A set of functions {fi (x), P(x), ...fn(x)} on an interval J 
= [a,b] is said to be an independent set of functions if they are not a 
dependent set. More precisely, this set of functions is an independent set 
on the interval Jif 


a; filz) + a2 falz) a lal g an Jn(x) = 0 
for all x e J forces all n scalars a1, a2, ..., an to be 0. 


Example 11.2.4. So the set of functions {1, x, x, 2, 8x, — 5x + 2} is a dependent set of 
functions on any interval / since the fifth function is clearly a linear combination of the 
rest of the functions in this set. 


The hard part is to determine when a set of functions is independent; 
happily this can be done through the use of the Wronskian. We can 
combine the concepts of linearly independent and dependent sets of 
functions and the Wronskian in the following theorem. 


Theorem 11.2.1. 4 set of functions {f\(x), fax), ... fn(x)}, on an interval I 
= [a, b], where each fk is at least n — 1 times differentiable, is linearly 
independent if there exists rel such that 


W (fis fass fa) (£) #0. 


582 


Proof We will prove the contrapositive statement that if the set {f1(x), 
f2(x),..-5 fn(x)} is linearly dependent on J, then W (fi, f2, fr) (x) = 0 for all x 
€e I. By definition, if the functions {f1(x), fo), ... fn(x)} are linearly 


dependent on the interval J, then 


(11.7) a, fı(£) + a2 fo(r) +--+ + anfn(xz) =0 


for constants a1,a2,..., an, of which at least one is nonzero. The kth 
derivative of equation (11.7) satisfies 


aig) Aft (E) +a2f2° (2) +--+ an f(a) = 0 


for 1 <k<n- 1. In matrix form, these n equations become 


fi(z) fo(zx) ance fn(x) a, 0 
fi(2) falz) = n(x) az 0 


Hea) a e Pa) | Lan 0 


This equation has a nontrivial solution only when the determinant of the 
matrix on the its LHS is zero. But this is simply 


W ainra a) =0 


Example 11.2.5. We will compute the Wronskian of the set of functions 


{fi(z) = 1, fo(x) = x, fa(z)= 1°}, x € [0,20] 


which was used in Example 11.2.2 to model the trajectory of a rocket. 
r? 


l. $ 
W (fi, fo, f3) (£) = det 0 1 22 
0 0 2 


Il 
W 


Since the Wronskian is 2 on the entire interval [0, 20], we can conclude that the set {fi (œ) 
= 1, fx) =x, BO) = x} is linearly independent on [0, 20]. Note that we could also 
conclude that the set is linearly independent on any interval [a, b], since the Wronskian is 
constant, and therefore not dependent on x. 


583 


Example 11.2.6. It would be nice to see that the Wronskian is zero when we have a 
linearly dependent set of functions. So we will take the functions from Example 11.2.5, 
and add to them the function f4(x) = x — 3x, which can be written as fa(x) = f2(x) — 3f3(x): 


1 2 z? 2-32? 
ý -ai 0 1 Qa 1-62r 
W (fi, fo, fs, fa) (x) = det 0 0 2 =f =0 
0 0 0 0 


Notice that the Wronskian is zero, independent of the interval we choose since the last 
row of the matrix given in the formula consists of all zeros. Of course, the main reason 
that the Wronskian of this set of functions was zero is the fact that four derivatives were 
taken of polynomials all of degree < 2, which automatically tells us that 


(4) = 
fi (x) = , ireren 


Example 11.2.7. Mathematica has a built-in Wronskian command, aptly named 
Wronskian, and we use it on the next two sets of functions: 


F, = {cos(t), sin(t), cos(2t), sin(2t)} 
F- = {cos(t), sin(t), cos(2t), sin(2t), 2 sin(t) — cos(2t)} 


W1 = Wronskian[{Cos|t], Sin[{t], Cos[2 t], Sin[2 t]}, t] 
18 


W2 = Wronskian[{Cosjt],Sin[t],Cos[2 t],Sin[2 t],2Sin{t]—Cos[2 t]},t] 
0 


No matter what interval we choose, notice that W1(t) # 0, while W2(¢) = 0. Therefore, the 
set F1 is linearly independent on every interval 7, while the set F2 is linearly dependent on 
every interval / 


Note that there are striking similarities between the work done in this 
section, and that of Section 4.2. However, there is one important 
difference. In the aforementioned section, we were required to choose n 
functions to fit a dataset of n points. As long as the resulting square matrix 
was invertible, we were guaranteed that the linear combination of the n 
functions would pass through all n points. The pseudoinverse method is a 
generalization of this process, which does reduce to the square system 
setting found in Section 4.2. 


584 


Homework Problems 


1. Determine what types of function would best fit the following 
sets of data. You may chose from linear functions, quadratic 
functions, polynomials of degree > 2, a linear combination of sines 
and cosines, and exponential functions. 


© =a © 


—0.50 4.40 
—0.25 5.31 
0.00 4.62 
0.25 2.37 
0.50 | —1.60 
0.67 | —4.01 
1.00 | —3.42 
1.50 4.56 


2. Set up, but do not solve, the nonsquare system of equations for a 
plane of fit of the form z = ax + by + c given the following set of 
points in R3. 


S = {(2,3, —2), (—1,4, —9), (0, 2, —1), (3, 4, —2), (—1, —1, 5)} 


3. Any vertical line excluding the y-axis can be expressed as ax = 1. 
Using this equation, find the vertical line of fit to the pair of points 
{1, 4), (2, 3)}. What would you expect the result to be? Does the 
actual answer agree with your guess? 

4. Repeat problem 3, this time using an arbitrary pair of points {(x1, 
y1), (x2, y2)} Attempt to interpret the answer. 

5. (a) Find the Wronskian for the set of functions 
8 = {1,e7,e"7,e**} 


(b) Is S a dependent or independent set of functions for x € [-1, 
1]? 


585 


6. (a) Find the Wronskian for each of the following sets of 
functions: 


S = {1l,e*,sinh(x)}, T = {e7,e~*,cosh(z)}, R = {e*, cosh(zx), sinh(x)} 
(b) Which of the sets, S, T, and R, are dependent for x € [0,1]? 


Mathematica Problems 


1. Fit each of the following sets of data to a function of fit expressed 
as a linear combination of the functions {1, x, x’, x? ; xh. 


@ zl[y (b) «| y © 
—2.2 | 34.8 —2.0 7.43 
—1.9 | 22.8 -1.5 8.12 
—1.5 | 12.4 —1.0 §.22 
—0.8 | 4.12 —0.5 2.02 
—0.4 | 2.23 0.0 1.01 
0.1 | 0.83 0.5 2.19 
0.8 | 0.62 1.5 8.31 
12 | 2:32 2.5 | —0.32 


2. Plot the data sets and corresponding functions found by the 
pseudoinverse method for each part of problem 1. 


3. For each of the functions found in problem 1, compute the error 


in the approximation by evaluating |A 7 =g ; 
4. Fit each of the following sets of data to a function of fit expressed 


—I „EI „—2r „2r 
as a linear combination of the functions {1, € ad r€ } 3 


5. Plot the datasets and corresponding functions found by the 
pseudoinverse method for each part of problem 4. 


586 


6. For each of the functions found in problem 4, compute the error 


in the approximation by evaluating |A 7 = ¢ ; 


7. Fit each of the following sets of data to a function of fit expressed 
as a linear combination of the functions 


{1,sin (4), cos (4) , sin (4) , cos (4) }. 


8. Plot the datasets and corresponding functions found by the 
pseudoinverse method for each part of problem 7. 


9. For each of the functions found in problem 7, compute the error 


in the approximation by evaluating A 7 =g ; 


10. Fit each of the following sets of data to a function of the form z 
=ax + by +c: 


11. Plot the datasets and corresponding functions found by the 
pseudoinverse method for each part of problem 10. 


12. Reuse problem 9, only referring back to problem 10 instead of 
T: 


13. (a) Find the Wronskian for each of the following sets of 
functions: 


587 


4 = ami” mo å 5 A 
S= {1.6 ee 2 ee), TS a eee} 


(b) For arbitrary a, b, c R, are the sets S and T dependent for x 

€ [a, b]? 

(c) Can you generalize the results of parts (a) and (b)? Explain. 
14. (a) For a set of n functions {fi(x, y), fa(x, y), ...,fn(x% y)} come 
up with a version of the Wronskian to test whether these functions 
form a dependent or independent set on the rectangle [a, b] x [c, d] 
cR. 

(b) Use your test in part (a) to decide whether the set {1, x, y, 

xy, xy“, xy} is a dependent or independent set on the unit 

square [0, 1] x [0,1]. 

(c) Use your test in part (a) to decide whether the set 


— 


{ 


cos(x), cos(y), cos(ay), sin(x), sin(y), sin(xy) } 


is a dependent or independent set on the square [0, 2a] x [0, 
2r]. 


Research Projects 


1. Explain the mathematics involved in medical imaging, such as 
MRI (magnetic resonance imaging), and how the image is 
produced. 

2. Explain a use for the pseudoinverse and solving overdetermined 
systems of linear equations in your particular field of study. 


11.3 Least-Squares Fits and 
Pseudoinverses 


We have spent a sufficient amount of time utilizing the pseudoinverse to 
fit curves to datasets. The next step is to determine whether this method 
results in the best possible approximation. To do this, we need to 
determine a measure of how good a fit is, and then how to get the best 
one. The simplest measure of how well a function fits a fixed dataset is 


588 


called the squared deviation. To compute the squared deviation for a 
given function and dataset, one simply computes the square of the 
difference between the function values and the exact values from the 
dataset. In our situation, we are concerned with a linear combination of 
functions of the form 


y = ai f;(x) + a2 fo(x) +--+ + anfn(z) 
and corresponding dataset 
S = {(1, 41), (22, y2), -- - (Em: Ym) } 


We will consider only the case of m > n, the case in which the 
pseudoinverse method is used to find a curve of fit. From the function and 
corresponding dataset given above, we arrive at the system of equations 


(11.9) 
Yk = a filte) + aa fo(re) +---+Onfn(ae), k= 1,2,...,m 


which, in matrix form Ft = y, is given by 
(11.10) 


filzı) fo(ti) > fanlı) a yı 
fi(z2) felz2) --: fa(z2) 


I 


fi(tm) fa(tm) > fn(Zm) an Ym 
To solve system (11.10), we simply apply the pseudoinverse method to 
get a = WEY. The squared deviation is given by 


D(@) =|} -Ft 


(11.11) 
Now notice that D is actually a function of the vector a, whose 
components are the constants a1, a2, ... an in the linear combination 


defined in equation (11.9). If we change just one of the ax terms, then p@ 
) also changes. Hence, D is a function of n variables. To find the value of 


a that minimizes D, we must take the gradient of D, set it equal to zero, 


589 


and then solve for the unknown coefficients. Remembering our 


multivariable calculus, the gradient of D@) is given by 


_ (əb) aD?) aD(@) 
matt ar es a T 


In order to actually perform any computations with V p@), we must first 
rewrite (11.11) in a way that allows us to take a partial derivative with 
respect to each of a&. The following expression obviously looks more 
complicated, but is just a quadratic function in regards to the ax terms, and 
can easily have the gradient operator applied to it: 


(11.12) 
D(a,,a2,...,4n) = Fi - Fr}? 


= Ey [pe — (a1 fi (te) + @2fo(ae) + +++ + + anfald] 
k=1 


-Xf = Lore] 


Now we can take the partial derivative of the above expression with 
respect to an arbitrary coefficient aj as follows: 


ôD ô $ i 2 
ða; = ða; >| J Dorf] 

= y i [ue = Sanfel) 

k=1 1 


T= 


= >| -2f5(0) [m - Saft) ) 


k=1 =1 


Setting VD = 0 is equivalent to setting each component to zero. We must 
now think once again of our main goal. We would like to show that the 


590 


pseudoinverse solution corresponds to the minimum of the squared 


ap _ 
deviation. In order to do this, we will rewrite 9s as follows: 
m 
Sod anfee) flan) = 2 Lil fy(ae)y 
k=1r=1 
aD _ 
To arrive at this expression, we simply divided both sides of 9s by 


— 2 and moved the negative portion of the sum to the other side of the 
equal sign. We now have n of these equations, and thus end up with the 
following very complicated-looking system of equations: 


m m 


wet (rx) fr( Tk) = X fi(Tk)yk 


k=1r=1 k=l 
X 2 a, (Tk) fr (re) = X fo(re) ur 
k=1r=1 k=l 


m 


S Y afalar) feler) 


(11.13) =r? 


li 


DAC 
k=1 


Now remember, this system of equations is formulated in terms of the 
unknown constants a1, a2,...,an and a solution to this system corresponds 
to a minimum of the square deviation. Our goal now is to rewrite this 
system as a giant equation. Let us first focus on the RHS of (11.13), which 
can be rewritten as 


591 


m fi(zi) filtz) +++ filam) yı 

NO folre)u falzi) falz2) = falm) || we 

k=1 a . . : : 

7 falai) falta) <*> faltm) | | vm 
fn(ze)y 

2, kJYk 


We have just shown that the RHS of (11.13), in matrix form, can be 


represented by F sf ; what an observation! At this point, we should pause 
and think back to our pseudoinverse approach to solving nonsquare 
systems. The first step in this process was to make the system square. In 


our current case, the nonsquare equation ra = v is made square by 
multiplying both sides, on the left, by F7. Thus our square system is 


T —_ Fre ; : : : 
FER: q =F y. It is now our hope that the following matrix equation, 
corresponding to the LHS of (11.13), is valid: 


SeA Tk) fr(2x) 

orn filti) filtz) +++ filtm) 
22 afal Tr) fr(2x) falar) falza) -++  fa(atm) 
k=1 r= ; r š 


fils) Alea} re Aleni 


m 


Dani (xx) fr (2x) 
k=1 r=l 


filmi) falı) + falzi) ai 
filz2) falz2) +++ fnlx2) a2 


Rie fa(am) See Faiz) di 


It should take only a few minutes to realize that this equation is indeed 
true. Now we need to ask, What does this mean? Once again, system 


(11.13) corresponds to the equations that must be satisfied if a is a 
minimum for D. Now we have shown that this system can be expressed in 


: T mtd : f ; IL g 
matrix form as F F @=F 7. The solution to this matrix equation is 


592 


a = wey, which is the pseudoinverse solution to the nonsquare system 


ra = Y. Therefore, the pseudoinverse method yields a solution 
corresponding to a minimum of the least-squared deviation, which is why 
this approach is sometimes referred to as the method of least-squares. 
Note that this approach works as it does because the gradient method 
results in a linear system of equations with respect to the unknown ax 
values. The reason why the gradient method gives us this linear system is 
due to the way in which equation (11.9) is written. If a different 
combination were chosen such that the a, terms did not appear linearly, 
system (11.13) would not be the end result of the gradient method, thus 
negating the relationship between the pseudoinverse method and the 
least-squared deviation. 


The most common application of the method of least-squared deviation is 
the linear case. If our function is given by y = aix + ao, then notice that if 
we have more than two points in our dataset, a perfect fit will be highly 
unlikely. Following the work done in the arbitrary case, we will consider 
the case of m points in our dataset: {(x1, y1), (x2, y2), ... (Xm, ¥m)}. The 
mean-square deviation D will be a function of the variables ag and a1 in 
this instance, with 


D(ao,a1) = |J — FTI? 


= a (yk — (aire + a9))° 


k=1 


where y is the column of y-coordinates, F is the m x 2 matrix whose rows 


are [x;, 1], for 1 < k <m, and q- [ao, ai)’. Then, D(ao, a1) = 0 if and only 
if yk =a1xk + ao for each 1 < k < m, which is equivalent to stating that the 
overdetermined system of linear equations yk = aixx + ao is satisfied for 
all k. As previously pointed out, this is highly unlikely. The best choice of 
the values of ag and aj would be those that give the least-squared 
deviation D(ao,a1). Following the more general procedure, we have to 


solve VD = 0, which is equivalent to simultaneously satisfying the two 
ƏD — OD a 
equations 940 , and 9a) where 


593 


= = $ —2 (yk — (aire + a0)) 


z =) —2rp (yk — (12% + a0)) 
(11.14) k= 


Setting both of these partial derivatives to zero and performing some 
algebraic manipulation, we arrive at the following system of equations: 


m 'm m 
ag) 1+a, > te=> yk 
k=1 k=1 k=1 
m m m 
ao Tk + ay ot E D Ky 
k=1 


(11.15) #5! ao 


You may have seen this system of equations before, as they are commonly 
used in statistics when dealing with the topic of linear regression. 


Example 11.3.1. Now we will have Mathematica compute the solution to the system 
above, along with the pseudoinverse solution, to Example 11.2.1. In the bookstore 
example, the number of points was m = 12: 

S = {{415, 13.25}, {372, 13.85}, {391, 13.60}, {428, 13.15}, {403, 


13.65}, {350, 14.05}, {362, 13.90}, {410, 13.20}, {385, 13.60}, {434, 
13.00}, {465, 12.75}, {380, 13.65}}; 


Sumx = Sum[S[[j, 1]], {j, 1, 12} 
4795 


Sumy = Sum[S[[j, 2]], {j, 1, 12}) 

161.65 

Sumxy = Sum([S[fj, 1]] S{[j, 2]], {j, 1, 12} 
64 452.8 


594 


Sumxsqrd = Sum|S|[j, 1]]?, {j, 1, 12}] 

1 927 933 

(Coeffs = {{Sumxsqrd, Sumx}, {Sumx, 12}}) // MatrixForm 
( 1927933 4795 ) 


4795 12 


(RHS = {{Sumxy}, {Sumy}}) // MatrixForm 
64 452.8 
161.65 

Inverse[Coeffs].RHS // MatrixForm 


—0.0117255 
18.1561 


This says that the least-squares solution for the line of best fit to this bookstore’s data is 


p(x) = —0.0117254892r + 18.156143 


Comparing this to the solution given in equation (11.6), we see that the least-squares 
solution is the same as the pseudoinverse approximate solution to the over determined 
system. 


If the preceding approach, and subsequent linear example, is too 
complicated or confusing in showing that the pseudoinverse solution 
corresponds to the solution that minimizes the least-squared deviation, 
then we will try one last approach. Sometimes a geometric representation 
of a problem can yield a better understanding of how it can be solved. To 


make this approach as simple as possible, we start with the matrix At = 
v. Here, A « R”, T c Rand è e R”, To make the matrix equation 
correspond to an overdetermined system, we once again assume that m > 
n. Let T: R” — R” be the linear map with standard matrix A representing 
it, where the rank of the matrix A is n. The rank of the matrix A is n if and 
only if the n columns of A form an independent set in R”, These n 


595 


—> —>» — 
columns of the matrix A form a basis {wi Dae t Wn} for a subspace 
S of R”. This subspace § is the image of T, which means that 


S =Im(T) = {AF |? € R"} 


If nM g S, then no solution to the original matrix equation exists. Now the 
squared variation D(a) is the distance squared from ? to an element At 
of the subspace § = Im(7). So D (7) = |e = ATI? forall T e R”, 
The question then becomes: What vector in § lies closest to b? The 


— oo 
shortest distance between b and S occurs when 4 AZ = projs( b ), 


> _ 

since b — projs( b ) is orthogonal to all of S. In Section 8.4, we 
constructed a formula for the projection of a vector onto an entire 
subspace: 


= = 
rojs( b ) = A [(AT A) +AT] b 
a116) P is(b) aA ] 
> 
What this means is that since a solution to At = b can be satisfied only if 
— 
b e€ S. The next best thing is to instead solve the equation 


) 


em: 
(11.17) AZ = projs( b 


Clearly this equation has a solution, and this solution z gives the 
least-squared deviation to the original problem. So substituting (11.16) 
into (11.17) gives the following equation: 


p > 
A? = A[(ATA)“!A7] b 
> 
Note that the pseudoinverse solution 7 = p(A)b satisfies this equation, 


= T 4\~1 AT 
since p(A) = (A A) A . Thus, the least-squares solution z is the 
< 


same as the pseudoinverse solution p(4)b . 


596 


Homework Problems 


1. Let {(x1,y1), (2,2), ---, (Xm, Ym)} be a dataset of m points. Let T 


and Y be the averages, respectively, of the x-coordinates and 
y-coordinates from this dataset, and let 


Zz = eS eee oe y — (iyn: Wa) 


be the two vectors formed from these coordinates. 
(a) Show that the equation of the least-squared line of best fit to 
this dataset is y = ao + aix, where 


_gz -z297 _ 2-J-mīy 
pT ap -m7 ” uS Eik — m7? 
(Hint: We can apply Cramer’s rule.) 
(b) Show that if © = 0, then 


_ N 
e? 


(c) Using part (a), show that the least-squared error in this fit is 


ao =J, ay, 


m 
D= Y (w ao — azk)? = PP +a? |R|? +2aom(a3 — 9) — 2a, P- Y +ma? 
k=1 


Does this simplify any further if you replace ao and a1 by their 
formulas from part (a)? 


(d) Use the formulas above to find the equation of the 
least-squared line of best fit and the least-squared error in this 
fit for the dataset 


{(—3, 11), (2,4), (5, —1), (9, —7)} 


Plot this least-squared line of best fit with the dataset. 
2. Let {@1, V1, 21), (2, y2, 22), +++» (Xm, Ym, Zm)} be a dataset of m 


points. Let T, Y and Z be the averages, respectively, of their x, y, 
and z-coordinates from this dataset. Let 


597 


P = (21,22,---;2m), Y = (yi, Y2., ha). To Zm) 


be the three vectors formed from these coordinates. 
(a) Find the equation z = ag + aix +a2y of the least-squared 
plane of best fit to this dataset and formulas similar to those in 
problem 1 for ag, aj and a2 [Hint: We can apply Cramer’s 
tule.) 


(b) Find a formula for the least-squared error D in this fit 
similar to that of problem 1. 

(c) Use the formulas of parts (a) and (b) to find the equation of 
the least-squared plane of best fit and the least-squared error in 
this fit for the following dataset: 


{(—3, 11,6), (2,4, —1), (5, ~1, —2), (9, -7, 8), (—1, —11, —4)} 


3. Thus far, we have not fit datapoints to a circle of radius r and 
center(a, b), given by the expression (x — ay” + (y- by =r’. The 
complication in trying to solve this problem is that the unknowns a 
and b are not linear: x” — 2ax + a? + y—2yb + be =r’. However, if 
we rewrite this expression as 


2ax + 2yb + c = r? +y? 


where c =72 — a? — b, then we can solve the system for a, b, and c. 
Explain how this new linear system of equations can be used to 
solve for the third unknown r from the original equation for the 
circle. 

4. Determine whether a method similar to that of problem 3 can be 
applied to construct a system of linear equations for an ellipse of the 
form 


(z-a)?  (y—b)? 
a t = 


for unknown constants a, b, A, and B. 


598 


Mathematica Problems 


1. Use homework problem 3 to find the circle of best fit to the given 
datasets: 


(a) {(1.0,0.75), (1.4, 0.92), (2.0, 1.03), (2.6, 0.93), (0.2, -1.82), (0.8, —2.61), 
(1.7, —2.93), (3.2, 2.67), (2.3, —2.89), (3.6, 0.23) } 


(b) {(—4.7, -1.54), (—4.4, -3.11), (—4.1, —0.68), (—3.9, —3.38), 
(-—3.5, —3.42), (—2.9, —0.24), (—2.5, —3.61), (—2.1, —3.43), 
(—1.8, -0.74), (—1.7, —3.12)} 


(c) {(—3.9, 2.72), (—3.4, 0.22), (—2.7, 4.51), (1.9, 0.88), (—1.1, -1.01), 
(—0.3, 4.93), (0.1, —0.76), (0.7, —0.45), (1.1, 4.13), (1.8, 0.97)} 


2. The Mathematica command to carry out various kinds of 
least-squared fits is given by Fit[Data, funs, vars]. In order to do 
least-squared lines y = ax + b of best fit to a dataset S, you would 
use the following command: 


Fit(S, {1, x}, x] 


(a) Use this Mathematica command to find the least-squared 
line of best fit to the data in homework problem 1. Plot these 
data with their line of best fit. What is the least-squared error D 
in this fit? 

(b) Use this Mathematica command to find the least-squared 
line of best fit to the dataset 


{ (4217, 13.72), (3825, 14.03), (4106, 13.89), 
(4391, 13.44), (3937, 13.95), (4569, 13.15)big} 


Plot these data with their line of best fit. What is the 
least-squared error D in this fit? 

(c) If the data in part (b) represent six consecutive months’ 
worth of demand data for a bookstore, where x is the number of 
books sold per month and y is their average sale price, then 
how many books will the store sell in a month, where its 
average sale price for the month has been set at $13.50? 


599 


3. (Refer to problem 2.) If we wished to use the Fit command to 
find a plane of best fit, z = ax + by + c, for a set of data S, we could 
use the following Fit: 


Fit[S, {1, x, y}, {x, y} 


(a) Find the equation of the least-squared plane of best fit and 
the least-squared error in this fit for the dataset from homework 
problem 2 (c). 
(b) Plot this least-squared plane of best fit with this dataset. 
4. (Hint: This problem is related to the prime number theorem.) Let 
x be a positive real variable and the function n(x) be the number of 
primes <(x) 
(a) Using the Mathematica function PrimePi, compute the 
values of n(x) and the construct the dataset whose points are 


y T 
(z, n(x) ) for x = 10%, where k = 6,7,...,21. 
(b) Plot these data with the equation of the least-squared line of 
best fit to these data. 
(c) Find the least-squared error D in this fit. 
(d) Instead of a least-squared line of best fit, use instead the 
least-squared logarithm of best fit y = a + b In(x) to these data. 
Plot this least-squared logarithm of best fit with the data, and 
find their least-squared error D. 


50 
5. Let f(z) = In ( k=1 k=) where x is a real variable. The 
reason for using the logarithm is because the values of these sums 
become very large as x increases and the logarithm will return much 
smaller, more reasonable values to approximate by a best fit. 
(a) Plot the function f(x) for x € [-5, 10]. Are there one or more 
parts of this graph that seem linear? 
(b) Construct the dataset of the points on the graph of this 
function for x e [0,10] at half-units of x. Now find the 
least-squared line of best fit to this dataset as well as its 
least-squared error D. 
(c) Plot this line of best fit with the original function f(x) for x 
e [0,10]. 


600 


r — 50 Ae d 
6. Let f(z) = k=l k , where x is a real variable. 
(a) Plot the function f(x) for x € [-10, 10]. Is there a part(s) of 
1 


this graph that seems like z7? 

(b) Construct the dataset of the points on the graph of this 
function for x e [-10, —2], at half-units of x. Now find the 
least-squared linear combination 


b c d e g 
y=at cat pat 76 tg t 710 
of best fit to this dataset as well as its least-squared error D. 
(c) Plot this least-squared linear combination of best fit with the 
original function f(x) for x e [—10, —2]. 
7. Find some reasonably accurate world population data from the 
Internet by specific years. 
(a) Now find a least-squared linear combination of best fit to 
these data. You might want to plot these data first to have some 
idea of what sort of functions to use in your linear combination. 
(b) Plot these data with your least-squared linear combination 
of best fit and give its least-squared error D. 


601 


Chapter 12 


Eigenvalues and 
Eigenvectors 


12.1 What Are Eigenvalues 
and Eigenvectors, and Why 
Do We Need Them? 


The answer to both of the questions posed in the title above lies in the 
answer to the following simple question: What is the nicest general type of 
square matrix for doing arithmetic with matrices, in particular, 
multiplication and finding inverses? The nicest general type of square 
matrix is the diagonal matrix, such as the identity matrices. The reason is 
that when you multiply two diagonal matrices of the same size, you 
merely need to multiply corresponding entries (the order of multiplication 
will not matter for diagonal matrices), and when you invert a diagonal 
matrix you only need take the reciprocals of the diagonal entries assuming 
that all of them are nonzero. As well, the determinant of a diagonal matrix 
is the product of its diagonal entries. 


Let us do a few examples to see that this is all true. In the following 

example, and all that follow it in this chapter, we use Diag instead of D to 

denote a diagonal matrix since Mathematica uses the letter D as a reserved 
d 


symbol for the differential operator dr, and so we can not use D to name 
or define anything in Mathematica. 


602 


Example 12.1.1. In this first example, we will compute with Mathematica the following 
products, powers, inverses, and roots of diagonal matrices to see that the result is also a 
diagonal matrix. This example will show you the extreme simplicity of doing matrix 
arithmetic with diagonal matrices: 


(A = {{5, 0, 0}, {0, —9, 0}, {0, 0, 7}}) // MatrixForm 


5 0 0 
0 -9 0 
0 0 7 


(B = {{x, 0, 0}, {0, 4/11, 0}, {0, 0, —3}})// MatrixForm 


m 0 0 
0% 0 
0 0 -3 


A.B // MatrixForm 


br 0 0 
36 

0 -*% o0 

0 0 -21 


B.A // MatrixForm 


5m 0 0 
0 -¥% 0 
0 0 -21 


Inverse[A] // MatrixForm 


1 

+ 0 0 
0 -4 0 
0 0 4 


603 


Inverse[B] // MatrixForm 


1 
1 0 0 
11 
1 
00 -} 


MatrixPower[A, 5] // MatrixForm 
3125 0 0 
0 59049 0 
0 0 16 807 
{{55, 0, 0}, {0, (—9)5, 0}, {0, 0, 75}}// MatrixForm 
3125 0 0 
0  —59049 0 
0 0 16 807 


(Diag = {{a, 0, 0}, {0, b, 0}, {0, 0, c}})// MatrixForm 


0 0 
b 0 
0c 


MatrixPower[Diag, 10] // MatrixForm 


alo 0 0 
o p W 
0 oO Œ 


(G = {{10, 0, 0}, {0, 100, 0}, {0, 0, 1000}})// MatrixForm 


10 0 0 
0 100 O 
0 0 1000 


oops 


604 


(FourthRootG = {{10-°, 0, 0}, {0, 100-75, 0}, {0, 0, 1000:75}}) // 


MatrixForm 
1.77828 0 0 
0 3.16228 0 
0 0 5.62341 


MatrixPower[FourthRootG, 4] // MatrixForm 


10 0 0 
O 100. 0 
0 0 1000. 


From Example 12.1.1, we see that all the statements made, about how nice 
diagonal matrices are, were correct. If only all square matrices were 
diagonal, then all of our work with matrices would be so much easier! 
Specifically, if we needed to take the power of a square matrix or even the 
exponential of a square matrix, we could just move the operation in 
question to the diagonal entries to get our result. We refer back to the 
homework problems of Section 5.1 for more information on diagonal 
matrix properties. 


Example 12.1.2. Next, we look at the exponential ef, starting with a 3 x 3 diagonal 
matrix 


oor & 
a Oo © 


(12.1) 


First, we need to define what we mean by e^ The exponential function, in terms of the 
scalar variable x, can be defined by the Maclaurin series 


“1, 
fa 


which converges for all x € C . There should be no reason why the variable x cannot be 
replaced by a matrix A. In particular, if we use the definition of A in equation (12.1) with 
equation (12.2), we get 


(12.2) 


605 


(12.3) k=0 


However, from Example 12.1.1, we know that 


at 0 0 
A‘F=|0 o0 
00 ¢& 
so that (12.3) can now be expressed as follows: 
— 1 
A k 
cm —A 
| 
= k! 
oo 1 a¥ 0 0 
=yo5/0 & 0 
ko |0 0 Æ 
3k 
v ga 0 0 
= 0 go 0 
k=0 
0 0 ge 


606 


So the exponential operation applied to a diagonal matrix can be moved to its diagonal 


entries: 
a 0 0 €e 0 0 
A=|0 b 0|>eĉ=]|0 è 0 
(12.4) 0 0 c 0 O e 


Clearly, the prior argument shows that this is independent of the square dimension of A, 


and can be extended to diagonal matrices of C nn, Recall that the exponential eft bl of 


a complex number a + bi is defined as 


ER e2 +? — 6% (cos(b) + isin(b)) 


Our next question is: How can we do this for a general square matrix that 
is not diagonal? From equation (12.3), we see that it is necessary to first 
find an easy way to compute the powers of A by relating A to some 
diagonal matrix D. The beginning of our answer lies in the following 
interesting fact about taking the power of a certain type of product of 
square matrices. Namely, if D is any square matrix and Q is any invertible 
square matrix of the same size as D, then for any positive integer n we 
have 


ag (@DQY" = 9D" 


A theorem and proof of this fact follows next. Note that since matrix 
multiplication is not typically commutative, opg! is not generally D. 
Two matrices D and opa! are called similar matrices since they both 
have the same determinant and other properties in common; think of this 
as analogous to similar triangles. 


Theorem 12.1.1. For any square matrix D, and any invertible square 
matrix O of the same size as D, we have for any positive integer n that 


(QDQ7")" = QD"Q" 


Proof. In order to see that this formula is valid, all we need to do is expand 
out completely (QD. For simplicity, let us take n = 3 so that you get 


607 


the idea without the mess. We can then apply induction to show that it is 
valid for arbitrary n: 


(QDQ~')* = (QDR!) (QDQ™) (QDQ™) 
= QD (QQ) D (QQ) DQ" 
= QD (I) D (1) DQ" 
= QDDDQ™! 
= Qp°Q-' 


If n > 3, then writing out this product will give n — 1 products oo, 
which all disappear to leave D” between Q and ol. 


This tells us that if we have a square matrix A and we can find two square 
matrices D and Q of the same size as A with Q invertible and D diagonal 
where A = ODO", then A” = OD"Q"!, where we know how to compute 
D”, since D is diagonal. So now we know how to compute A” if only we 
can find matrices D and Q with D diagonal and Q invertible. 


How do we find two square matrices D and Q of the same size as A with 
Q invertible and D diagonal where A = QDO! If both Q and D exist for a 
given square matrix A so that A = ODO, then we say that A is 
diagonalizable. Not every square matrix A is diagonalizable, but almost all 
are. 


To begin, however, we can express this equation A = ODO" 'as AQ = QD 
instead, simply by right-multiplying both sides of the equation by Q. This 
tells us what QD looks like for D a diagonal matrix. All we need is an 
example to see what happens for both of these products, this we will do in 
Example 12.1.3. 


Example 12.1.3. Consider the following matrices: 


a 0 0 def 12 3 
D=|0601,Q0=|¢9 h 41 |,A=] 45 6 
0 0 L j k l 789 


608 


DiagD = {{a, 0, 0}, {0, b, 0}, {0, 0, c}}; 
Q = {{d, e, f}, {g, h, i}, {i, k, 1}}; 

A = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}; 
Q.DiagD // MatrixForm 


ad be cf 
ag bh ci 
aj bk cl 
A.Q // MatrixForm 


4d+5g+6j 4e+5h+6k 4f4+5i1+61 


d+2g+3j e+2h+3k f+2i1+31 
7d+8g+9j 7e+8h+9k 7f+8i1+91 


(QColumn1 = Q{[All, 1]]) // MatrixForm 


A.QColumn1 // MatrixForm 


d+2g¢+3j 
4d+5g¢+6j 


7d+8g¢+9j 


(QColumn2 = Q|[All, 2]]) // MatrixForm 
e 
h 

c) 


609 


A.QColumn2 // MatrixForm 


e+2h+3k 
4e+5h+6k 
7e+8h+9k 


(QColumn3 = Q{[All, 3]]) // MatrixForm 
f 


A.QColumn3 // MatrixForm 


f+2i+31 
4f+5i+61 
7£+8i+91 


So QD, for D a diagonal matrix, is the matrix whose columns are Q multiplied by the 
corresponding entry of D. 


In addition, the first column of the product AQ is A times the first column of Q by the 
way matrix multiplication works and what we see above. Similarly, the second (or third) 
column of the product AQ is A times the second (or third) column of Q. 


In conclusion, we have now learned that AQ = QD is really a system of three equations 
concerning the columns of Q. It really says that A times the Ath column of Q equals the 
kth entry on the diagonal of D times the kth column of Q for k = 1, 2, 3. In other words, 
we have the three matrix equations 


d d e e f f 
AI gl ala a ALA EE] AL. Al © | Sel t 
j j k k l l 
which we can conveniently express as 
d 0 e 0 f 0 
(A—als)| g | =| 0 |,(A—bls)| A | =] O |, (A—cls)] i | =] 0 
j 0 k 0 l 0 


where J3 is the 3 x 3 identity matrix. So, to find the two matrices D and Q from the matrix 
A so that AQ = QD, we must find the numbers on the diagonal of D and the columns of 
Q. Note that these three equations are really the same type of equation and of the form: A 


610 


times the first (second or third) column of Q is equal to the first (second or third) entry of 
D times the first (second or third) column of Q. 


Note that O must have an inverse, and so no column of Q can be a column of all zeros 
since then the determinant of Q would be zero and Q would not have an inverse. 


From Example 12.1.3, we see that a square n x n matrix A can be written 
as A = opg! or equivalently AQ = QD for a diagonal matrix D and an 
invertible matrix Q exactly when AX; = Dk,kXk, where Dkk is the kth entry 
on the diagonal of D and Xx is the Ath column of Q. If we switch notation 
and let A; = Dkk, then our equation is AXk = AgXx or simply AX = 1X if we 
drop the subscript, which is also the matrix equation (A — /A)X = 0, where 
à is a complex scalar variable whereas X is a nonzero column matrix 


variable in C”. 


Now let à be one of the diagonal entries of D and X be a column of Q. 
Then we want to solve the equation AX = AX for both X and A, where à 
goes on the diagonal of D in the same location that X goes into Q as a 
column. It might seem that we cannot solve for these two unknowns when 
we have only a single equation, but amazingly we can do it. 


By the way, a value for A is called an eigenvalue of A where the column X, 
which goes with this À, is called an eigenvector for the eigenvalue à. The 
German word eigen means characteristic. 


In order to solve the equation AX = 1X, we need to rewrite it as (A — ADX = 
0, where J is the identity matrix of the same size as A, D, and Q, and 0 is 
the zero column of the same size as X. Remember that X cannot be the 
zero column since it must go into Q, which has to have an inverse. For the 
matrix equation (A — ADX = 0 to have a nonzero solution X (think of À as 
being fixed for now), the matrix A — AI cannot have an inverse, or 
equivalently det(A — AJ) = 0. If det(A — ADF 0, then (4 — ADhas an inverse 
and the equation (A — ADX = 0 has as its only solution X = (A — AD ‘0 =0, 
which is not possible for any column of Q. 


Hence, we solve the equation AX = AX for X and i by first solving the 
equation det(A — AJ) = 0 for A followed by solving the equation (4 — ADX = 
0 for a nonzero column X. The As will make up the diagonal entries of D, 
while their corresponding Xs go in as the columns of Q in the same 
location as where the As are placed. It is usual that for each à we get 
“essentially” a single solution for X. 


611 


Example 12.1.4. We will attempt to highlight some of the above mentioned concepts 
with a simple 2 x 2 matrix. Let 


2 =2 
ta] 


We will next check whether A is diagonalizable by looking for the diagonal matrix D and 
invertible matrix Q such that A = oDg'. We must first find A’s eigenvalues by solving 
det(A — A1) = 0 for à. This equation will be a second-order polynomial equation in the 
variable à, where the polynomial P(A) = det(A — 2D is called the characteristic 
polynomial of A, or sometimes the eigenpolynomial of A. The eigenvalues of A are the 
roots (or zeros) of P(A). Then we find X’s nonzero eigenvector X by solving the matrix 
equation (A — AIX = 0: 


(A = {{12, —2}, {—7, —1}}) // MatrixForm 


12 -2 
-7 -l 
(AMinusAI = A — X IdentityMatrix[2]) // MatrixForm 
12- à —2 
-7 -1-A 
EigenPoly = Det[AMinusAI] 
—26-11\+\? 
Solve[EigenPoly == 0, À] 
(A + -2}, {A + 13}} 


Al = 13; A2 = —2; 
(X = {{x}, {y}}) // MatrixForm 


(š) 


612 


(AMinusAI /. {A-A1}) // MatrixForm 
at e 
-7 -14 
AMinusAI.X /. {A—-A1} // MatrixForm 
-x — 2y 
-7x — l4y 


Solve[{Evaluate[(AMinusAI.X /.{A-+A1}) == 0})}, {x, y} 
Solve::svars : Equations may not give solutions for all “solve” variables. >> 


{{x + —2y}} 

AMinusAI.X /. {A-+A2} // MatrixForm 
l4x— 2y 
-7x +y 


Solve[{Evaluate[(AMinusAI.X /.{A—+A2}) == 0)}, {x, y} 


Solve::svars ; Equations may not give solutions for all “solve” variables. >> 


(27) 


(DiagA = {{A1, 0}, {0, A2}}) // MatrixForm 


(0 t) 


(Q = {{-2, 1}, {1, 7}}) // MatrixForm 


(77) 


613 


Q.DiagA.Inverse[Q] // MatrixForm 


ee 


A // MatrixForm 


(3 a) 


The eigenvalues of A have been found to be A1 = 13 and A2 = —2, which are the roots of 
the characteristic polynomial P(A) of A: 


P(A) =det (A — Al) 
w= =3 
sae ([ 3> 2, J) 
= \? —11\ — 26 


The eigenvector X1, for eigenvalue 41 = 13, is any nonzero solution to the equation (4 — 
A1)X = 0, given explicitly by 


[=r ae] [7 ]=[0] 


which is the same as the system of equations 
{-2 — 2y = 0, -7z — 14y = 0} 


This system has as its solution: 


ee 


which allows us to take X1 = [-2 1. The eigenvector X2, corresponding to A2 = — 2, is 
any nonzero solution to the (4 — A2)X = 0, given explicitly by 


[3 a] Lr ]=[0] 


614 


which is the same as the system of equations 
{142 — 2y = 0, -Tz + y = 0} 


This system has the following as its solution: 


y 7 


which allows us to take X2 = [1 7f. . Hence, the matrix Q and the diagonal matrix D, are 


given, respectively, by 


TE 13 0 
1 |=" E 


© 
| 


The Mathematica code has checked that A is diagonalizable, that is, that A = opg"! 


Note that there are an infinite number of solutions for the eigenvectors X for each 
eigenvalue A; we merely need to take one that is not zero as a column of Q. So our matrix 
A is diagonalizable. Let us now verify that B= op*o", as we expect. Let us also see if 

= Qe? ol, which we expect (and which will prove in Section 12.2), since the 
exponentiation moves to D: 


A.A.A // MatrixForm 


2050 —294 
—1029 139 


(Q.MatrixPower[DiagA, 3].Inverse[Q]) // MatrixForm 
2050 —294 
—1029 139 

N[Sum[MatrixPower[A, n]/n!, {n, 0, 25}]] // MatrixForm 


412520. —58931.4 
—206260.  29465.9 


615 


(eDiag\ = {{e*1, 0}, {0, e*?}}) // MatrixForm 
eS 0 

(% 3) 

N[Q.eDiagA.Inverse[Q]] // MatrixForm 

( 412919. —58988.4 ) 


—206460.  29494.4 


N[MatrixExp[A]] // MatrixForm 


412919. —58988.4 
—206460. 29494.4 


In Section 12.2, we will discuss diagonalizability in more detail, as well as provide an 
example of a matrix A that is not diagonalizable. See homework problem 4 here if you 
cannot wait until the next section. 


Homework Problems 


1. Find the characteristic polynomial P(A), the eigenvalues, and 
their corresponding independent eigenvectors, for the matrix 


5 —5 
aT R 


If A is diagonalizable, then find the diagonal matrix D and the 
invertible matrix Q so that A = opa"!. Also find e^. 


2. Find the characteristic polynomial P(A), the eigenvalues, and 
their corresponding eigenvectors for the matrix 


L35 
A=|0 4 
0 0 


616 


If A is diagonalizable, then find the diagonal matrix D and the 
invertible matrix Q so that A = ODO"! .Also find e^. 


3. Consider the following complex-valued matrix: 


=l —1+3i 6+2i 
"R 14-198 -91-4- 7 


(a) Compute the eigenvalues of A. 
(b) Find the two complex matrices D and Q so that A = opa". 
(c) Check that 47 = OD?Q"! 
(d) Check that A and D have the same determinant. 
(e) Check that 47! = op"'o"!. 
4. Show that both of the following matrices are not diagonalizable. 


5 2 5 pu ss 
@ [5 3 | (b) | 0 5 0 
0 0 5 
5. Let 
i 3 5 
A= 0 4 2 
0 0 7 


Find 2, a, and A‘, Also, find a, A. and 4+. Is there anything 
you want to conjecture about the form of A” for any integer n? 
6. Let 


5 2 
AFA 


Find a formula for A” for any integer n. Now use it to find a formula 
for 


7. Extend your last formula to a general exponential formula for ef, 
where 


617 


s- [3 1] 


with à and K any real numbers with K # 0. 
8. Find a formula for A” for each of the following matrices, then use 
this to compute ef: 


Nn o 


2 5 0 
of (ce) | 0 5 
005 005 0 0 


an 


9. Using your answers to problem 8, find a formula for ef for each 
of the following matrices. Here, à and K are any real numbers with 
K#0: 


\ K 0 OK 40 0 
(a) |0 A O| wwa 0| (lOorAK 
0 A 0 BoA 0 0 A 


10. If A is a square upper or lower triangular matrix of any size n x 
n, then is e^ also of the same type, upper or lower triangular of size 
n x n? If you answer yes, then prove it to be true. If you answer no, 
then provide an example to show that this is not true. 

11. Can you find a formula for ef when 4 is either upper or lower 
triangular of any size n x n or just the special types mentioned in 
previous homework problems? 

12. Let 


a b 
c d 
for real constants a, b, c, and d. Show that A’s characteristic 


polynomial is 


P(A) = à? — trace(A)A + det(A) 


618 


where trace(A) = TA i A aj the trace of a square matrix A is 


simply the sum of A’s diagonal entries. 
13. What does A’s characteristic polynomial look like if A is a 
general real 3 x 3 matrix 


a b ¢ 
A= | ie eT 
gh i 


instead of the 2 x 2 one from problem 12? 
14. Let 


ad 
A=10 b 
0 0 


oO a & 


be a real upper triangular matrix. What is A’s characteristic 
polynomial P(A), and what are its eigenvalues? Repeat this 
assuming that A is lower triangular or diagonal. 

15. Let A be a real n x n matrix that is diagonalizable. Prove that A 
is invertible if and only if it does not have 0 as an eigenvalue. If A is 
diagonalizable with no 0 eigenvalue, then give a formula for A! in 
terms of D and Q. 

16. Let A be a real n x n matrix that is invert ible. How are the 
eigenvalues and eigenvectors of A related to the eigenvalues and 
eigenvectors of A? (See problem 15.) 

17. Let A be a real n x n matrix that is diagonalizable with all real 
eigenvalues. Is it true that you can find a real invertible matrix Q so 
that A = obg! ? Explain the reasoning behind your answer. 

18. Let A be a real square matrix. If à is an eigenvalue of A with an 
eigenvector ?, that is, At = rz, then for any positive integer n, 
give an eigenvalue and a corresponding eigenvector for the matrix 
A”. What happens if n is a negative integer? 

19. Consider the following real-valued matrix: 


619 


-700 —184 200 -720 
_ 1] -50 970 373 573 
~ 186 | 60 324 -336 354 

-220 -196 116 66 


(a) Compute the eigenvalues of A. 

(b) Find the two matrices D and Q so that A = opo! 
(c) Check that A? = OD*Q! 

(d) Check that A and D have the same determinant. 
(e) Check that 47! = OD"'Q"1. 


Mathematica Problems 


1. Verify your answers to homework problems 1-3. 
2. Consider the following real valued matrix: 


—1511 —20 322 -—699 921 

1185 972 402 -—915 —3423 
A=>— 494 296 -—772 —42 -1170 
—594 -216 252 534 558 
—759 -84 -30 -75 1545 


(a) Compute the eigenvalues of A. 
(b) Find the two matrices D and Q so that A = oDg!. 
(c) Check that 4? = OD701. 
(d) Check that A and D have the same determinant. 
(e) Check that 47! = QD! g !. 
3. Use Mathematica to redo homework problem 19. 


Research Projects 


1. Find an algorithm (from a numerical analysis or numerical linear 
algebra book) that allows you to find an approximation to an 
eigenvector X for a given approximate eigenvalue à of a square 


620 


matrix 4. Now write some Mathematica code that will allow you to 
test this algorithm on a specific square matrix A. 


2. If you are in a major other than mathematics (engineering, 
biology, chemistry, physics, etc.), then search out an application of 
eigenvectors and eigenvalues that is directly applicable to your 
major. 


12.2 Summary of Definitions 
and Methods for Computing 
Eigenvalues and 
Eigenvectors as Well as the 
Exponential of a Matrix 


Now we have the ability to define eigenvalues and eigenvectors for a 
square n X n matrix A whose entries are real or complex numbers. We will 


not deal with complex square matrices A € E ”%” much in this chapter, 
since in many applications the matrix A is completely real. 


Definition 12.2.1. Let A be an n x n matrix with complex entries. Then a 
complex number À is called an eigenvalue for the matrix A if there exists a 


nonzero complex column vector z e C” satisfying Af = rw. The 
vector z is called an eigenvector of the matrix A for the eigenvalue À. 


The method for computing the eigenvalues and eigenvectors of the matrix 
A are given after the next definition for the characteristic polynomial P(A) 
for A, where A is the variable in this polynomial. 


Definition 12.2.2. The characteristic polynomial PO), for A e C™” 
defined to be 


P(A) = det(A — AIh) 


621 


where Jy is the n x n identity matrix and À is the polynomial’s variable. 


Example 12.2.1. As an example, if 


abc 
A=ide f 
go h 4 


then 


P(A) = det(A — Ag) = det d e—-A f 


A= {{a, b, c}, {d, e, f}, {g, h, i}}; 
Collect[Det[A — A IdentityMatrix[3}], A] 


—ceg+bfg+cdh-—afh—bdi+aei+ 
(bd-ae+cg+fh—ai-ei)A+(a+e+t+i)rA? -° 


The characteristic polynomial P(A) is of degree n if A €e C™” and so it 
has n complex roots if we count each root as many times as its multiplicity 
(or power) in the complete factoring of P(A) The n complex roots of P(A) 
are the eigenvalues of A. If the matrix A is beyond size 2 x 2, then its 
characteristic polynomial P(A) is of degree > 2, and so it would be difficult 
if not impossible, to find A’s eigenvalues by hand. In general, the 
Newton-Raphson algorithm can find the roots of any degree polynomial 
quite efficiently as approximations accurate to any desired number of 
digits. 


For a given eigenvalue à of A, you compute i’s eigenvectors z as the 


— 
nonzero solutions to the matrix equation (AAE = 0. Remember that 
for an eigenvalue à of A, det(A—AJ;) = 0 and the matrix has no inverse. If 
you use any approximation of the eigenvalue à of A, then det (A-A/n) + 0 
and the matrix A-AJ, has an inverse; this means that the matrix equation 


-> > 
(AAE = 0 has only the trivial solution 7 = 0. Thus, no 


622 


approximation of an eigenvalue à of A can be used to find A’s eigenvectors 


a a —> sai 

A by solving the matrix equation (AAE = 0 for r . There are 
methods for approximating eigenvectors Z for an approximate eigenvalue 
à of A, but we will not discuss them here, and so we must rely on 
Mathematica to compute the eigenvectors for us when all we have is an 
approximation of an eigenvalue à. See research project 3 of Section 12.4 if 


you are interested in one possible method for finding eigenvectors. 


For us, the purpose of the eigenvalues and eigenvectors for a square 
matrix A is to be able to diagonalize it. As such, let us now formally 
define what we mean by A being diagonalizable. 


Definition 12.2.3. Let A be an n x n square matrix with complex entries. 
Then A is said to be diagonalizable if there exist an n x n diagonal matrix 
D with all complex entries and an n x n matrix Q with all complex entries 
that is invertible where A = ODO". 


The n entries on the diagonal of the matrix D are the eigenvalues of A, 
where an eigenvalue appears on this diagonal as many times as it is a root 
of A’s characteristic polynomial P(A). 


The columns of the matrix Q are independent corresponding eigenvectors 
for the eigenvalues in D. This means that the Ath column of the matrix Q is 
an eigenvector for the eigenvalue at the Ath location on the diagonal of D. 
This matrix Q exists exactly when we can find a basis of the vector space 


C” consisting completely of eigenvectors for A. 


As it turns out, almost all square n x n matrices A are diagonalizable 
because they normally have n distinct eigenvalues [then each root of A’s 
characterisitic polynomial P(A) has multiplicity 1] and each eigenvalue has 
at least one eigenvector associated with it. The union of independent sets 
of eigenvectors from different eigenvalues always forms an independent 


set of column vectors in C”. 
Square matrices that are diagonalizable are a special subset of those which 


are similar. In fact, A is diagonalizable exactly when it is similar to a 
diagonal matrix D. 


Definition 12.2.4 Two square matrices A and B of the same size are said 
to be similar if there exists an invertible matrix Q of the same size as A 
and B with A = OBO"!. 


623 


Similar matrices have many common properties, and so people have 
classified similar matrices by type or canonical form in order to determine 
when two square matrices can be similar to each other. Study of canonical 
forms is intended for an advanced course in linear algebra, and so it is left 
for your future education and interest. 


Now we give the definition of e^ for any square matrix A using the 
Maclaurin series for the function e”. This definition, although identical to 
(12.3) in its form, is now applicable to all square matrices. 


Definition 12.2.5. Let A be any square matrix. Then its exponential, ef, is 
defined by 


oo 


2, Cee l k 
€ = y aA 

k=0 
As we have seen, when A is diagonalizable with A = obo", then ef = 
Oc? Or. which makes it much easier to compute ef than using the infinite 
series. If A is not diagonalizable, then e^ must be approximately computed 
using the infinite series by taking a partial sum, or if A is of some very 
special type of triangular matrix, then we might be able to find a simple 
formula for e4 from a formula for the powers of A. 


Homework Problems 


1. If two n x n matrices A and B are similar, then are their 
exponentials, e^ and e, also similar matrices? If yes, explain why; 
if no, give an example to verify it. 

2. Let A and B be two similar n x n matrices. Show that A and B 
have the same eigenvalues. How are their respective eigenvectors 
related for the same eigenvalue? 

3. Let A and B be two similar n x n matrices where A is 
diagonalizable. Show that B is also diagonalizable. [First, explain 
why (EF) != F'E for two invertible matrices E and F of the 
same size. ] 

4. Let A be a real n x n matrix and c be a real scalar. How are the 
eigenvalues and eigenvectors of A related to those of c4? Explain 
your answer. 


624 


5: nt A and B be two nxn real diagonal matrices. Show that efe = 
+ 


6. Let A and B be two n x n real matrices with A = ODO ‘and B= 

OF ome for two diagonal matrices D, F, and the same Q. First, show 

that and B commute, that is, AB = BA; then show that efe? = 
+ 

eo. 


7. Give two square matrices A and B of the same size satisfying 
efe? 4 e^4tB 

8. Let A be any real diagonalizable matrix and k be any positive 
integer. Show that (ef) = e“. Is this also true if k = 0 or k is 
negative? 

9. Let A be any real square matrix. Show that we have CS 
Kef K $ for any real invertible matrix K the same size as A. 


10. Is there a way to define the logarithm, ln(4), of a general real 
square matrix A, and if this fails, can you define In(A) for some 
specific type of real square matrix A? If so, give your definition and 
check that it works with a specific example for A; that is, check 
whether In (ef) = e") = 4. Tf you do not believe that you can 
define In(A) for any type of real square matrix A, then please 
explain your reasoning. 


11. Let A be any square n x n matrix. Define the trace of A as 


trace(A) = >D Ajj 


j=1 


Explain why for any two square n x n matrices A and B,trace(AB) = 
trace(BA). (Hint: Use Mathematica to help you understand why.) 

12. Show that if A and B are two similar matrices, then trace(A) = 
trace(B) and det(A) = det(B). (Hint: See Homework problem 11.) 

13. Give an example of two same-size square matrices A and B that 
are not similar, where det(A) = det(B). and trace(A) = trace(B) 
Hence, the converse of problem 12 is false. (Hint: See problem 11.) 
14. Are any two square diagonal matrices A and B of the same size 
always similar? If yes, explain why; if no, explain why not and give 
an example. 

15. Show that 


625 


for K 40 is not diagonalizable for any value of à. Can you give two 
other versions of this matrix A that are also not diagonalizable? 


Research Projects 


1. Investigate the topic of Markov chains from statistics, and see 
how eigenvalues and eigenvectors are helpful in their study. 


12.3 Applications of the 
Diagonalizability of Square 
Matrices 


It is clear from the last two sections that the ability to diagonalize a square 
matrix A enables us to easily take powers of the matrix A by A” = op"o"! 
if A = opg! Simply put, this tells us that similar matrices have similar 
powers for the same similarity matrix Q. As a consequence, this also 
allows us to find the exponential of the square matrix A by e = Qe" Q`. 
This exponential formula will be very useful to us in solving square 
systems of first-order linear differential equations in Section 12.4. The 
power formula 4” = oD"g! is also useful in statistics when studying 
linear Markov processes, which we leave for you to discover in a statistics 
course. Also, eigenvalues and eigenvectors are helpful in physics, 
specifically in quantum mechanics. 


Before we start in with examples, let us prove that if A = opg! then ef 
= Q? g!. As before, this simply says that similar matrices have similar 
exponentials for the same similarity matrix Q. 


Theor 12.3.1. If A is an x n matrix such that A = obo", then ef = 
Qe Q 


626 


Proof. As in Section 12.2, we start with the definition of the matrix 
exponential in Definition 12.2.5. Notice that in the following string of 
equalities, Theorem 12.1.1 is used: 


=Q — pk og 


where we know from Section 12.2 that if D is diagonal, then e? is also 
diagonal, and is obtained by taking the exponential of the diagonal entries 
of D 


Something else of interest is that you can find the formulas for sin(A) and 
cos(A) for any square diagonalizable matrix A using the Maclaurin series 
for sin(x) and cos(x). This is left as an exercise. 


Now let us do an example of diagonalizability where the eigenvalues and 
eigenvectors are complex, not real, as we saw in Section 12.2. It turns out 
to be the case that if A is a real symmetric matrix, then its eigenvalues are 
all real, which allows us to take only real eigenvectors for these 
eigenvalues, but this is a rare case. Even when 4 is a real matrix, its 
eigenvalues and eigenvectors are almost always all complex, occurring in 
complex conjugate pairs as we shall see in the example to follow. 


Example 12.3.1. We will investigate whether the matrix 


627 


3 —5 1 
A= 0 2 -2 
—4 7 0 


is diagonalizable, and if so, what the matrices, D and Q, that diagonalize it are. If D and 
Q exist, we will use them to find ef and A*. 


A = {{3, —5, 1}, {0, 2, —2}, {—4, 7, 0}}; 
(AMinusAI = A — à IdentityMatrix[3]) // MatrixForm 


3-A -5 1 
0 2-4 =2 
—4 7 -À 


EigenPoly = Det[AMinusAI] 
10—244+5A7—)° 


AListExact = Solve[EigenPoly == 0, àA] 
1 47 1/3 
i 3 (- (-280 + 320247)" 3 enana) ) } 
5 47 (1 +iv3) 1 > 1/3 
"rae ee 8) eee }, 
5 47 (1 —iv3) 1 a, 1/3 
{as 5+ TETN -5 (1+iv3) (-280+3v20247)  }} 


AList Approx = N[AListExact] 
{{A — 0.456043}, {A — 2.27198+44.09462i}, {A + 2.27198—4.09462i}} 


(X = {{x}, {y}, {z}}) // MatrixForm 


x 


y 
Z 


628 


AMinusAl /. AList Approx[{1]] // MatrixForm 


2.54396 —5 1 
0 1.54396 —2 
-4 T —0.456043 


Det[AMinusAl /. AList Approx[[1}]] 


—5.16345x 10715 


(EqnsA1l = AMinusAI.X /. AList Approx([1]]) // MatrixForm 


2.54396 x —5y +z 
1.54396 y — 2z 
—4x + 7 y — 0.456043 z 


Solve[{2.54396 x — 5 y + z == 0, 1.54396 y — 2 z == 0, —4x + 7 
y — 0.456043 z == 0}, {x, y, z}] 


{{x > 0., y > 0., z = 0.}} 


Notice above that if we solve for the eigenvector, denoted X, for the approximate real 
eigenvalue À = 0.456043, then we get the zero column, which we cannot use as a column 
of Q. The problem is that when you use an approximation to an eigenvalue instead of its 
exact value as we just did, then the matrix A — AI has a nonzero determinant. This would 
imply that the system of linear equations (A — A/) X = 0 has the solution X = 0 only, which 
cannot be used as an eigenvector. 


One remedy is to not approximate the eigenvalues A, but use their exact values or at least 
approximate the eigenvalues to the required decimal-place accuracy to where it will make 
no difference. The exact values of a matrix’s eigenvalues can generally be found only 
when the size of the matrix is not more than 4. This would be very cumbersome to do by 
hand, but with Mathematica it is almost a pleasure to do. 


629 


AMinusAI /. AListExact[[1]] 


(en i (S a (ooo) y) 


Simplify[Det{AMinusAI /. AListExact[[1])])) 


0 


(EqnsA1Exact = Simplify[ AMinusAI.X /. AListExact([1]]]) // Ma- 
trixForm 


1 47 /3\ 
1 (1+ Camara amy ~ (-280 + 3V7) x—5y+z 
1 ar Hiz 20247)!" ) y- 
3 [1+ (asoa vaa ( 280 +3 20247) ) y—2z 
EN 280 + 320287) "/") z 


1 47 
4x+Ty— 3 (s (-280+3V20247) 


Solve[Flatten{[Eqns\1Exact] == 0, Flatten[X]] 


Solve::svars : Equations may not give solutions for all “solve” variables, > 


1/3 
{{x + - a a + xp (-280+3 20247) 
12 42 (-280 + 3,/20247)""" 12 
rap See 
AT ESA naaa 1/3 


6z 


yo 
tht (—280 + 320247) '/* }} 


630 


EVA1 = N[%J|[1]] 
{x — 2.152892, y — 1.29537 z} 


This tells us that we can use as eigenvector X, for the real eigenvalue A = 0.456043, the 
column _X = [2.15289 1.29537 1]". Next we need to find the remaining eigenvectors X for 
the other two complex eigenvalues. 


AMinusAlI /. AListExact|{[2]] 
4 47 (1+ iV3) 1 s 1/3 
3 - o yaa +5 (1-iv3) (—280 + 320247) 5,1}, 


{0.5 - ee + $ (1 —iv3) (—280 + 3v20247) "",-2}, 


47 (1+iv3 i 1/3 
{41,5 a ERE * a (1-108) (=20+ avaa) y} 
Simplify[Det[AMinusAI /. AListExact[[2]]]] 


0 


(EqnsA2Exact = Simplify[AMinusAI.X /. AListExact[[2])]) // Ma- 
trixForm 


1 i 47(1+iv3 i £ 1/3 = 
(8 mana + (1 —iv3) (—280 + 3\/20247) x-5y+z 


1 2 47(1+iv3 _; L 1/3 - 
L(2 Coran + (1 —iv3) (—280 + 320247) )y 2z 


zd | 47(1+iv3 ails es 1/3 
4x+Ty-} (10+ ae z + i (i + V3) (—280 + 320247) z 


631 


Solve[Flatten{EqnsA2Exact] == 0, Flatten[X]] 
Solve::svars : Equations may not give solutions for all “solve” variables. > 


5 47 47i 
{{x ua ETRAS ROSEY ——« T i hae RRR 
24 (—280 + 320247) 8v3 (—280 + 3\/20247) 


a 1/3 
1/3 = 
= (-280 + 3v20247) FA sl 
24 8v3 


2 
2- E 5 + (1 — iv3) (-280 + 320247) "° 


122z 


eee Sen 
= eT + (1 — v3) (-280 + 320247) ° ) 


EVA2 = N[%][[1)) 
{x — (—0.624523—0.172627 i) z, y + (—0.0323018+0.486301 i)z} 


This tells us that we can use as eigenvector X, for the complex eigenvalue A = 2.27198 + 
4.09462i, the column 


—0.624523 — 0.172627 i 
X = | —0.0323018 + 0.486301 i 
1 


AMinusAI /. AListExact([3]] 
{ { 4 47 (1 — iv3) 
3 6 (—280 + 3v20247) 
{0,3 - -e ++ (1+iv3) (-280 + sv20247)'”* ,-2}, 
3 6 (-280 + 3\/20247) 6 
5 47 (1 — iv3) 
{ KERES a —— 
6 (—280 + 3\/20247) 


+ : (1 +iv3) (-280 + svmai7)  ,-5,1}, 


+ 5 (1+iv3) (-280 + sv 20247)” }} 


Simplify[Det{AMinusAI /. AListExact([3}]]] 
0 


632 


(EqnsA3Exact = Simplify[AMinusAI.X /. AListExact{[3]]]) // Ma- 
trixForm 


ie a7i(i+v3 r E: 1/3 a 
6 (s an rd + (1+ iv3) (—280 + 320247) ) x—5y+z 
g: 47i(i+v3 s X 1/3 3 
é (2 + aan aT + (1 +iv3) (—280 + 320247) “") y- 2z 


E 4 a7(1-iv3 ee E 1/3 
4x+Ty 3 (104 + (-1-iv3) (—280 + 320247) ya 


Solve[Flatten[EqnsA3Exact] == 0, Flatten[X]] 
Solve::svars : Equations may not give solutions for all “solve” variables. > 


5 47 47i 
{{x +) 3 * Son alee a ae 
24 (—280 + 3/20247) 83 (—280 + 320247) 


Pt 1/3 
1 1/3 4 (—280 + 320247) 
A (-280 + 3v20247) = ae 


21 


2 i 1/3 
= n + (1+ i v3) (—280 + 320247) ” 


12z 
SSS 
2+ Can ea + (1+ i V3) (-280 + 320247) * hi 


Z, 


EVA3 = N[%][[1]] 
{x + (—0.624523+0.172627 i) z, y + (—0.0323018—0.486301 i) z} 


This tells us that we can use as eigenvector X, for the complex eigenvalue A = 2.27198 — 
4.094622z, vector 


—0.624523 + 0.172627 i 
X = | —0.0323018 — 0.486301 i 
1 


Note that these are the complex conjugates of the previous values for à and X. Next, we 
can put the eigenvalues and eigenvectors together to form D and Q. 


633 


(Diag = Chop[DiagonalMatrix[Table[A/. AList Approx([k]], {k, 1, 3} 
Ill) // MatrixForm 


0.456043 0 0 
0 2.27198 + 4.09462 i 0 


0 0 2.27198 — 4.09462 i 


(Q = Join[Join[X /. EVAL /. z—=1, X /. EVA2 /. 2-41, 2], X /. 
EVA3 /. z—1, 2]) // MatrixForm 


1.29537 —0.0323018 + 0.486301i1 —0.0323018 — 0.486301 i 


2.15289 —0.624523 — 0.172627i —0.624523 + 0.172627 i 
1 1 1 


Chop[Q.Diag.Inverse[Q]] // MatrixForm 


ao —& 1 
0 2. —2 
—4. 7. 0 


3-5 1 
0 2 -2 
—4 7 0 


Now we will compute both ef and At since we now know that A is diagonalizable. 


MatrixPower[A, 4] // MatrixForm 


—275 532 -97 
8 —180 216 
428 —736 32 


Chop[Q.MatrixPower[Diag, 4].Inverse[Q]] // MatrixForm 


—275. 532. —97. 
8. —180. 216. 


428. —736. 32. 


634 


(eDiag = Chop[DiagonalMatrix[Table[e* /. AListApprox([k]], {k, 1, 
3}]]]) // MatrixForm 


1.57782 0 0 
0 —5.61762 — 7.90598 i 0 


0 0 —5.61762 + 7.90598 i 


Chop[Q.eDiag.Inverse[Q]] // MatrixForm 


—4.57852 10.522 —0.375964 
1.47101 -—4.5703 4.79719 
8.85888 —13.1127 —0.508612 


N[Sum[MatrixPower[A, n]/n!, {n, 0, 25}]] // MatrixForm 


—4.57852 10.522 0.375964 
1.47101 —4.5703 4.79719 
8.85888 —13.1127 —0.508612 


N[MatrixExp[A]] // MatrixForm 


—4.57852 10.522 —0.375964 
1.47101 —4.5703 4.79719 
8.85888 —13.1127 —0.508612 


Now you have seen an example of both the power and the pitfalls of using eigenvalues 
and eigenvectors. Clearly, if the size of the matrix A gets any bigger, it becomes hopeless 
to do anything by hand without driving yourself nuts with silly errors. Happily, 
Mathematica has built-in Eigenvalues and Eigenvectors commands that we will now 
use. These commands use numerical approximation methods that we leave for a 
numerical analysis course to explain to you. 


635 


Simplify [Eigenvalues[A, Cubics—True]] 


(; (w+ 08, SHANG) +3 (608) (-a0-avamm"”), 


6 -280 + 320247) /* 

47 (1-i V3 1/3 
1 (104 E + (-1-i v3) (—280 +320247) |, 
6 (~280 + 320247) 


= |5= ee + (-280+ sv 247) } 
3 (—280 + 3/20247) 


EigVals2 = N[Eigenvalues[A, Cubics—True]] 
{2.27198+4.09462 i, 2.27198—4.09462 i, 0.456043} 


(Q2 = Transpose[N[Eigenvectors[A, Cubics—True]]]) // MatrixForm 


—0.0323018 + 0.486301i —0.0323018 — 0.486301i 1.29537 


—0.624523 — 0.172627i —0.624523 + 0.172627i 2.15289 
1. L. 1. 


(Diag2 = Chop[DiagonalMatrix[EigVals2]]) // MatrixForm 


2.27198 + 4.09462 i 0 0 
0 2.27198 — 4.094621 0 
0 0 0.456043 


Chop[Q2.Diag2.Inverse[{Q2]] // MatrixForm 


3 -5 1. 
0 2 —2. 
—4 7. 0 


The last matrix Q2 above is the matrix Q obtained from the results of the Eigenvectors 
command, while Diag? is the diagonal eigenvalue matrix D obtained from the same 
command. 


The following example will give a square matrix A that is not 


diagonalizable. Section 12.5 will explain in better detail how the rare case 
of nondiagonalizability can occur. 


636 


Example 12.3.2. Let 


—10 11 -6 
A=]|-15 16 —10 
=$ § =f 


Let us see if A in fact diagonalizable: 


A = {{—10, 11, —6}, {—15, 16, —10}, {—3, 3, —2}}; 


3 -5 1 
0 2 -2 
-—4 7 0 


-10 -À 11 -6 
-15 16—-A -10 
-3 3 -2 -À 


Det[AMinus\I] 

2-5 A+4A?—A? 
Factor[%] 

-(—2+\) (—1+A)? 

AList = Eigenvalues[A] 
{2, 1,2} 
Eigenvectors[A] 


{{26, 30, 3}, {1, 1, 0}, {0, 0, 0}} 


The reason why A is not diagonalizable is that A has only two eigenvalues, 4 = 1 and A= 
2, with à = 1 occurring twice as a root of the characteristic polynomial, while à = 2 occurs 
only once. You might think that this is no problem, and that we can use the eigenvalue A 
= | twice in the diagonal matrix D. From the result of the Eigenvectors command, we 


637 


see that there is essentially only one independent eigenvector for the eigenvalue à = 1, 


namely v = (1,1,0), since all others are multiples of this single eigenvector. The reason 
for this is that the set of all eigenvectors for the eigenvalue à = 1, with the zero vector 


soe C3 Saat: : E7 : : 
thrown in, is a subspace of that has dimension one. The eigenvector is a basis 


for this subspace of C 3 , and we need this subspace to have dimension two if we are to 
use à = 1 as an eigenvalue twice in D. We also need two independent eigenvectors for the 
eigenvalue A= | if we are to use it twice on the diagonal of D and these eigenvectors as 
corresponding columns of Q. The eigenvectors that form the columns of Q must form an 
independent set since a square matrix Q is invertible if and only if its columns (or rows) 
form an independent set and a square matrix. 


The last example of this section will look at diagonalizing a 4 x 4 matrix 
A, all of whose entries are complex numbers. We have only looked at 
problems in this text involving complex numbers on a few occasions, but 
you should still know that this type of problem can occur and works very 
much like the real case. Note that in A below, we have placed a decimal 
point in the real part of A1,1 so that Mathematica will automatically 
approximate A’s eigenvalues since it believes that 41,1 is now an 
approximate value. You should remove this decimal point from 41,1 in 
order to see what the Eigenvalues command produces instead. 


Example 12.3.3. Consider 
2—i 3 i 0 
3+i —i tr -2 
1 2 3 4 
—2i 0 -1 i 


A= 


Let us see if A is diagonalizable or not, and find A and ef if it is. 


A = {{2.-1,5,1,0}, {3+1,—I,x I,—2}, {1,2,3,4}, {—2 1,0,-1,1}}; 
N[Eigenvalues{[A]] 


{6.21084 + 1.00782 i, 0.650175 — 3.22836 i, 
— 3.07388 — 1.08297 i, 1.21286 + 2.30351 i} 


638 


(AMinusAI = A — à IdentityMatrix[4]) // MatrixForm 


(iA B i 0 
Ss AA fn 2 
1 t 3-A A 
= 0 -T fod 


EigenPoly = Collect{[Det/AMinusAlI], A] 
(—171.805—37.2832 i) +(17.+9.85841 i) A—(4.4+15.2832i) \?—(5.—i) A844 
Factor[%] 


1. ((—6.21084 — 1.00782 i) + A) ((—1.21286 — 2.30351 i) + A) 
((—0.650175 + 3.22836 i) + A) ((3.07388 + 1.082974) + A) 


AList = Eigenvalues{A] 
{6.21084+1.00782i, 0.650175—3.22836i, —3.07388—1.08297i, 1.21286+2.30351i} 


EigVects = Chop[Eigenvectors[A]]} 


{{0.673459, 0.522023 + 0.193178 i, 0.386294 — 0.225716 i, —0.0624239— 
0.180445 i}, {0.647492, —0.0682944 — 0.245032 i, —0.217686 + 0.532529 i, 
0.429954 — 0.0146297 i}, {—0.642789 + 0.00657089 i, 0.684622, 0.019995+ 
0.16113 1, —0.16835 — 0.251727 i}, {—0.0823545 — 0.225002 i, 0.161624— 
0.156733 i, 0.688713, —0.36793 + 0.531232 i}} 


(Diag = Chop[DiagonalMatrix{AList]]) // MatrixForm 


6.21084 + 1.00782 i 0 0 0 
0 0.650175 — 3.22836 i 0 0 
0 0 ~—3.07388 — 1.08297 i 0 
0 0 0 1.21286 + 2.30351 i 


639 


Q = Transpose[Eig Vects]; 
Chop/Q.Diag.Inverse[Q]] // MatrixForm 


2.—1.i 5. Li 0 
3.41.8 Li. 3.141591 -2 
1. 2. 3. 4. 
—2.i 0 —l. l.i 


N[A] // MatrixForm 
1 
1 


2.— l.i 5. 0.+ 1.1 0. 

3.41.4 0.—1.1 0.4+3.141591 —2: 
Js 2. 3. 4. 

0.— 2.1 0. —1. 0.+ 1.1 


Q.MatrixPower[Diag, 5].Inverse[Q] // MatrixForm 


572.04 + 1915.14i 3201.22 + 2107.71 —838.583 + 3720.121 —1462.38 + 1829.191 
2081.93 + 2707.51 1978.97 + 2078.071 —1854.98 + 2737.091 —1755.36 + 852.6911 
2631.68 + 93.9114i 2410.19 + 114.248 995.886 + 2358.091 —181.489 + 1713.58i 
316.425 — 1145.43i 218.673 — 957.761i 1069.3—176.086i 741.788 + 86.76991 


MatrixPower[A, 5] // MatrixForm 


3572.04 + 1915.141 3201.22 + 2107.72 —838.583 + 3720.121 —1462.38 + 1829.192 
2081.93 + 2707.51 1978.97 + 2078.072 —1854.98 + 2737.091 —1755.36 + 852.6911 
2631.68 + 93.9114i 2410.19 + 114.248% 995.886 + 2358.091 —181.489 + 1713.58i 
316.425 — 1145.43i 218.673 — 957.7611 1069.3 — 176.0861 741.788 + 86.7699i 


So we see that 4°=OD°Q"! 


(eDiag = Chop|DiagonalMatrix|e*“**]|) // MatrixForm 


265.849 + 421.2457 0 0 0 
0 —1.90867 + 0.166027: 0 0 
0 0 0.0216736 — 0.0408477i 0 
0 0 0 —2.24955 + 2.499991 


Q.eDiag.Inverse[Q] // MatrixForm 


150.361 + 137.117i 136.106 + 124.1451 —77.8461 + 181.941 —95.1077 + 76.3431 
77.2814 + 149.6811 69.9951 + 136.1432 —112.086 + 118.1411 —93.9875 + 32.6074: 
132.292 + 27.3927i 120.463 + 26.07491 14.9997 + 131.3761 —26.4672 + 76.1305i 
23.1879 — 53.2913 20.0117 — 48.8503i 54.7942 + 4.096071 26.3568 + 19.2347i 


640 


N[Sum[MatrixPower[A, n]/n!, {n, 0, 25}]] // MatrixForm 


150.361 + 137.117} 136.106 + 124.1453 —77.8461 + 181.944 —95.1077 + 76.3431 
77.2814 + 149.681 69.9951 + 136.1431 —112.086 + 118.141i —93.9875 + 32.60743 
132.292 + 27.3927i 120.463 + 26.0749i 14.9997 + 131.3761 —26.4672 + 76.1305i 
23.1879 — 53.2913% 20.0117 — 48.8503i 54.7942 + 4.096071 26.3568 + 19.23471 


N[{MatrixExp[A]] // MatrixForm 


150.361 + 137.117i% 136.106 + 124.1453 —77.8461 + 181.941 —95.1077 + 76.3431 
77.2814 + 149,681i 69.9951 + 136.1431 —112.086 + 118.1411 —93.9875 + 32.6074i 
132.292 + 27.3927% 120.463 + 26.07491 14.9997 + 131.3761 —26.4672 + 76.1305i 
23.1879 — 53.2913 20.0117 — 48.8503i 54.7942 + 4.096071 26.3568 + 19.23471 


Homework Problems 


1. Find formulas for sin(A) and cos(A) for a square real matrix A 
using the Maclaurin series for sin(x) and cos(x). If the matrix A is 
diagonalizable, then find a formula for sin (A) and cos (A) using this 
diagonalizability. Check this formula on the diagonalizable real 
symmetric matrix 


5 2 
BEE 


to see if it works. Is the identity 
we 2 as 
sin*(A) + cos*(A) = Iz 


for the identity matrix J), still true for this matrix A? 
2. Let 


1 7 26 34 
A= 5 4 27 8 
3 —16 —4 


See if A is diagonalizable, and if so, find the matrices D and Q that 
diagonalize it. If D and Q exist, then use them to find ef and A’. 


641 


3. If you did problem 1, then use your results of problem 1 on the 
matrix A of problem 2. 

4. Let A be a square nondiagonalizable matrix. Explain why KAK l 
is also nondiagonalizable for any invertible matrix K of the same 
size as A. 

5. Let A be a square diagonalizable matrix. Explain why KAK lis 
also diagonalizable for any invertible matrix K of the same size as 
A. 


6. Let 


0 0 
6 0 
7 6 


Do 


— í 


Show that A is not diagonalizable. 


7. Let 
9 0 0 p 
5 9 0 
A= U O Ti H 
wU se IH 


Show that A is not diagonalizable. 

8. Show that the upper and lower triangular square matrices with all 
identical diagonal entries are all nondiagonalizable unless they are 
actually diagonal matrices. 

9. Let A and B be two real 2 x 2 diagonalizable matrices with A = 
ODO and B = KEK !. Show that the block 4 x 4 matrix created 
from A and B with A and B on its diagonal, that is, the 4 x 4 matrix 


A 0 
0 B 
where 0 is the zero 2 x 2 matrix, is also diagonalizable. 


10. If you did problem 9, show that if at least one of the matrices 4 
and B is not diagonalizable, then the block matrix 


642 


A 0 
0 B 
is also not diagonalizable. 


Mathematica Problems 


1. Let 
2+% —4—i 1+2% 5-i 
3-i 1—i į —2 
A= a 7 -3 i 
—2i 8 +i =l 3 +i 


See if A is diagonalizable or not, and find A and ef if it is 
diagonalizable. Of course, check that both A and ef are correct. 


2. If you did homework problem 1, then use your results from this 
problem on the matrix A of Mathematica problem 1. 


3. Show that the following matrix is not diagonalizable: 


4-lli 4-2 6-3 
G= | —4+2: 14-16% 12-61 
2—i —4+ 2i 9i 


4. Let A and B be defined as follows: 
2- 5i 8 — 2i 30 — 15% i 1—2: l1+i 
A= —-l1-10i 10-4 -—21-12i |,B=]|] 24+2% 1-H 2-i 
20+15% -16+6: —12+9i l—i 2+i 5i 
(a) Show that the 6 x 6 block matrix 
A T 


c=! 4 B 


is diagonalizable using the diagonalizability of both A and B, and 
using this diagonalizability, find e? and G?. 


643 


(b) Can you also find VG? Do so if it is possible, and then 
check your result. If not, then explain why not. 
5. Let A and B be the two matrices from the previous problem. Now 
let 


_[A]s 
c-a B 


Is G still diagonalizable? If yes, then diagonalize G. If not, then 
explain why not. 
6. Define G as follows: 


1 -1 2 1 0 3 

-] -2 0 | | a: 3 

2 l -l 1 01 

ie i =l 2 a | 
0 1 =I I 62 

3 0 1-1 2 0 


Use the diagonalizability of G to find both e? and VG; check your 
results. 


Research Projects 


1. For a real n x n matrix A, define the n x n matrix e and see 
if diagonalizability can help you find eA as it did for sin(A) and 
e^. Is the same thing possible for e°°) ?Can you find a way to 
decide whether 


d 


ae =A cos(A tjent) 


sin(4) 


for a real variable £? 


644 


12.4 Solving a Square 
First-Order Linear System 
of Differential Equations 


Now we will apply the diagonalizability of a square matrix A to solving a 
square first-order linear system of homogeneous ordinary differential 
equations. From now on, we will use the abbreviation ODE for the term 
ordinary differential equation. The form of a square 3 x 3 first-order linear 
homogeneous ODE system is 


wid = AX(t) 
(12.7) dt 
where 
z(t) i = 
z(t) dz 


p 
a 
~ 


and A is a square 3 x 3 matrix of real constants. Hence, the matrix 
equation can be expressed as 


dr 

a A1 T(t) + Aj ay(t) + A1,32(t) 
di ‘ 

= = Ag x(t) + Az 2y(t) + A2,32(t) 
{z 

= = Az, 2(t) + Az 2y(t) + Az.32z(t) 


This is called a linear system since the RHS of these equations involve 
linear combinations of the unknown functions x(t), y(t). and z(4). You will 
see later why it is called homogeneous. 


645 


In general, the derivative of a matrix of functions is the matrix of the 
corresponding derivatives of the original matrix’s entries. The usual 
derivative rules also apply to matrix functions, such as the product rule. 


First let us do an example so that you can see what we have in mind where 
all our eigenvalues and eigenvectors are real. 


Example 12.4.1. Let us solve the 3 x 3 square first-order linear system of homogeneous 
differential equations given by the three equations 


= = 52(t) — 9y(t) + z(t) 

iy = —9r(t) + 2y(t) + rz(t) 

az = x(t) + t) — 4zít 
uy“ x(t) + ry(t) (t) 


where x(f), (4), and z(¢) are three unknown functions of ¢ with the initial conditions 
(12.9) (0) = 7, y(0) = 15, 2(0) = 21 


This system of three differential equations can be written as a single matrix differential 
equation of the form (12.7), with 


5 —9 1 
A=]| -9 2 
1 m —4 


The initial conditions become the single initial condition 


x(0) 7 
X(0)= | y(0) | =] 15 
2(0) 21 

dX 


The derivative matrix dlt of the matrix X(f) is the matrix of the corresponding 


derivatives of the entry functions of X(t), and this is the definition for the derivative 
matrix. 


In order to solve this matrix differential equation for the column X(f), we first look at the 
simple analogous situation that we already know how to solve, namely 


646 


dw 
(12.10) at 


where w(t) is a normal function of f and a is a real constant. This simple differential 
equation has as its solution 


=aw(t) 


w(t) =e Cet en ext E 


where C is an arbitrary constant. This solution w(¢) can be found by knowing the 
derivative rules and realizing that w(¢) must be an exponential function, or by using the 
method of separation of variables if you have had previous experience solving 
differential equations. 


In case you do not recall the method of separation of variables, let us use it now to solve 
dw 

this differential equation. When considering equation (12.10), if we treat df asa 

fraction (it is not a fraction), then we can multiply the equation by dt to obtain 


dw = aw dt 


Notice that we have expressed w(f) as w in this expression. Next, we separate the 
variables w and ż to different sides of the equation: 


dw 
—=adt 
w 


Then we integrate both sides of this equation, with respect to the appropriate variable on 
each side: 


1 
—dw= | adt 
w 


Both of these are simple integrals to compute, which gives 


In(|w|) =at+D 


for an arbitrary constant D. This equation can be solved for w(¢) to obtain 
at 
iw(t)| =Ce 


where C = eP isa positive arbitrary constant. If we allow C to be negative and 0, then our 
solution is 


= a at 
azın We) = Ce 


647 


where C is an arbitrary constant. You can easily check that this solution works by 
plugging it back into the original differential equation (12.10). 


Now, if the differential equation (12.10) has the initial condition w(0) = wo € R, then 
the constant C is no longer arbitrary, but is a specific value determined by the initial 
condition. So we can determine the value of C by plugging the initial condition 
information into our solution (12.11). This gives 


w(0) = wo = Ce? =C 


which tells us that in the case of an initial condition w(0) = Wo the solution to our 
differential equation is 


w(t) = wo e** 
(12.12) 


This tells us that we should expect the matrix differential equation (12.7) with initial 
condition X(0) = Xo = [x(0) y(0) z(0)]7 to have solution 


X(t) = uD e 
(12.13) 


where Xo is the constant 3 x 1 column of initial conditions. The column Xo must be 
placed on the RHS in the product since the other order of multiplication is impossible 
because e" is a 3 x 3 matrix of functions of t since A is 3 x 3. 


If the matrix A is diagonalizable, then we know that A = QDQ", and so At = Q(DAQ!, 
since ¢ is a real scalar variable. We must also have 


ett as Qe” g7 


where Dt corresponds to scalar multiplication of D by the scalar variable ¢. In other 
words, if 


A, 0 0 
D={| 0 à 0 
0 0 A3 


then 


0 0 erst 


since 


648 


àt 0 0 
Pe = 0 àt 0 
0 0 Ast 
where the A, A2, and A3 are the eigenvalues of A. 


Now let us have Mathematica compute all of this for us by diagonalizing A. Since A is a 
real symmetric matrix, all of its eigenvalues and eigenvectors will be real: 


A = {{5., —9., 1.}, {-9., 2., m}, {1., m, —4.}}; 
(Q = Transpose[N[Eigenvectors[A]]]) // MatrixForm 


0.752282 —0.48108 0.450148 
—0.654218 —0.626257 0.42403 
—0.0779158 0.613485 0.785853 


EigVals = N[Eigenvalues[A]] 
{12.7232, —7.99117, —1.73205} 
(Diag = DiagonalMatrix[EigVals]) // MatrixForm 
12.7232 0. 0. 
0. —7.99117 0. 
0. 0. —1.73205 
Q.Diag.Inverse[Q] // MatrixForm 


5. -—9. l. 
=g; 2. 3.14159 
1, 3.14159 Sa, 


N[A] // MatrixForm 


5. —9, l. 
—9. 2. 3.14159 
1. 3.14159 —4. 


649 


We see from the last two computations that A is diagonalizable for the Q and D (or Diag) 
above. Now we can solve our first-order linear system of differential equations. 


(eDiag = DiagonalMatrix{e™*¥""* t]) // MatrixForm 


@ 2.7232 t 0. 0. 
0. e” 7.99117t 0. 
0. 0. e- 1-73205t 


(eAt = Chop[Simplify[Q.eDiag.Inverse[Q]]]) // MatrixForm 


{ {0.231438677991 17t + 0.202633 e~ 175205 + 0,565929 ẹ!27292t, 
0.30128 e77-299117t 4. 0,190876 e7 173205t _ 0,492156 @!2-7282¢, 
— 0.295136 e~ 79117 + 0.35375 e~ 75205 _ 0,0586147 clad 
{0.30128 e-791"7* + 0.190876 e- 17995" — 0.492156 212729", 
0.392198 e~ 79917" 4 9.179801 @7 173205 4. 9.428001 e1?-7282 
— 0.384199 e7799117 t 4 0,333225 e7 178205" + 0.0509739 an: 
{ - 0.295136 e-79"7* +. 0.35375 e- 179205" — 0,0586147 1? 79", 
— 0.384199 e~ 799117" + 0.333225 e~ 173205" + 0,0509739 e!?-7282*, 
0.376364 ¢~79°""7* + 0.617565 e~"'75295* + 0.00607087 e!? 7292" } } 
XO = {{7}, {15}, {21}}; 
(X = Chop[Simplify[eAt.X0]]) // MatrixForm 


—0.058579 e— 799117 +. 11.7103 e71-73205t — 4.65175 ẹ127232t 
~0.0762565 e~ 799117" 4 11.0309 e~1:78205' +. 4.04537 e!?:7282t 
0.0747012 e~7-99117* + 20.4435 @-}-73205¢ + 0.481794 @!?-7282t 


X /. t-0 
{{7.}, {15.}, {21.}} 


Note that the solution column X(f) above is written in a particularly nice way; namely, if 
the three eigenvalues of A are called (1, A2, and A3, then 


650 


X(t) = e^t Z + e*2t Zy + e*3! Za 
(12.14) 
dX 


for constant column vectors Z1, Z2, and Z3. If we compute the derivative dt for this 
linear combination, we get 


dX 
yr dz e!*Z, + Age*?* Za + Age” Z3 


since the Zs are constant columns. 


Now, switching to compute the other side of the differential equation, we have 


AX (t) = e^t AZ; + e™' AZ + e™' AZ, 


If we compare these two equations, we see that the solution given by (12.14) is going to 
be a solution to our differential equation (12.7) if we have AZk = AxZg, for k = 1,2,3. In 
particular, it is now easy to see that for eigenvectors X1, X2, and_X3 (forming the columns 
of Q) for the corresponding eigenvalues 1, 42, and A3 of A, each of the functions e^“! Xp, 
k= 1,2,3 is a solution to our system of differential equations. 


In a homogeneous linear system of differential equations, the linearity of the system will 
tell us that any linear combination of solutions is also a solution, that is, if U(¢) and V(¢) 
are two solutions, and a and b are two real arbitrary constants, then X(t) = aU(d) + bV(A) is 
also a solution since 


dX dU dV 

dt dt dt 
= a AU (t) + bAV (t) 
= A(aU (t) + bV (t)) 
= AX (t) 


So, from what we have seen above, we now expect that the solutions X(f) can be written 
as 


a215) X(t) = are Xi + ape?" Xa + age?" Xa 


for real arbitrary constants a1 a2, and a3, where the eigenvectors X1, X2, and X3 are for 
the corresponding eigenvalues A}, A2, and A3 of A, when A is diagonalizable. 


Now the initial condition X(0) = Xo will determine uniquely the three values of the 
arbitrary constants a1, a2, and a3 since 


651 


aje tX + age? 9X, + aze™! X3 
= a,X, + a2X2 + az X3 


So you can solve the linear system 
a,X, + arXo + a3X3 = Xo 


for a unique solution for three unknown coefficients a1, a2, and a3. This linear system can 
also be written as the matrix equation 


ay 
Q| a2 | = Xo 
a3 
giving 
a) 
az | =Q7'Xo 
a 
(12.16) 


for the columns of Q the eigenvectors X1, X2, and X3 in this order. We will verify all of 
this for our current example shortly. 


So, now we have another way to solve the n x n homogeneous first-order linear system of 
differential equations of the form (12.7) with initial condition X(0) = Xo for A 
diagonalizable. The solution n x 1 column X(f) is given by 


X(t) = = apet Xy 


k=1 


where 4’s n eigenvalues are the Ax terms, their corresponding eigenvectors are the Xk 
columns, and the ak coefficients are obtained by solving the linear system 


n 


a akXk = Xo 


kel 


Similar to equation 12.16, the column matrix of coefficients is given by 


652 


a) 
. | =Q'*Xo 


an 
(12.17) 


where the columns of the matrix Q are the eigenvectors X1, X2, ..., Xn, in this order. You 
can check that this solution works by plugging it into the differential equation (12.7) and 
also checking that X(0) = Xo. One thing to remember in solving differential equations 
with initial conditions is that their solution is unique, and so any method you use for 
getting the answer will give the same result as any other method, although the answers 
may appear to be different or rearranged. 


(DX = D[X, t]) // MatrixForm 


0.468115 e~7-99117* — 20.2829 e-1-73205t _ 59.1853 e1?-7232t 
0.609379 e~7-99117* — 19,106 e71:73205t + 51.4701 @!?-7282+ 
—0.596951 e~7-99117* — 35.4091 @~1-73205t +. 6,12997 ẹ12.7232t 
Simplify[A.X] // MatrixForm 


0.468115 @— 7-997 — 20,2829 e@— 1-73205t — 59,1853 @!?-7232t 
0.609379 e77:99117t act 19.106 e@@ 173205 + 51.4701 e@i2.7232t 
—0.596951 en 99117 — 35.4091 ge 1.73205 ¢t + 6.12997 e@12-7282t 


The last two outputs show that our solution X does satisfy the matrix differential equation 
(12.7), as expected. Now let us recompute the solution X(f) using (12.15), where a.X1+ 
a2X2+a3X3 = Xo, or [a1,a2 a3] = goxo, and A’s eigenvalues are A1, A2, and à3 with 
corresponding eigenvectors X1, X2, and X3 forming the columns of Q. We will compute 
the ax values directly as well as using the formula OX for them: 


X1 = Q/[All, 1]] 
{0.752282, —0.654218, —0.0779158} 
X2 = Q{[All, 2]) 


{—0.48108, —0.626257, 0.613485} 


653 


X3 = Q/[AN, 3]] 
{0.450148, 0.42403, 0.785853} 


LeftSideSystem = a X1 + b X2 + c X3 


{ 0.752282 a — 0.48108 b + 0.450148 c, —0.654218 a — 0.626257 b + 0.42403 c, 
— 0.0779158 a + 0.613485 b + 0.785853 c} 


CoeffsA = Solve[{LeftSideSystem == Flatten[X0]}, {a, b, c}] 


{{a + —6.18352, b + 0.121765, c > 26.0144}} 
Flatten[Inverse[Q].X0] 


{—6.18352, 0.121765, 26.0144} 


(a Exp[EigVals[[1]] t) X1 + b Exp[EigVals[[2]] t] X2 + c Exp[{EigVals 
[[3]] t] X3) /. Flatten[CoeffsA] 
{ — 0.0585797799117 t 4 11.7103 @~}-78205t _ 4.65175 12-7282", 

— 0.0762565 e~ 799227" + 11.0309 71-7925 + 4.04537 @)?-7282¢ 

0.0747012 e— 799117" 4 20.4435 e—1:79205* 4 0,481794 para) 


X // MatrixForm 


—0.058579 e77:99117t 4 11,7103 e7179205t _ 4.65175 ẹ127292t 
—0.0762565 e-79917t + 11,0309 e~1-73205* + 4.04537 ẹ!27232t 
0.0747012 e~7-99!17* 4. 20.4435 7 1-73205* + 0,481794 ẹ127282t 


Our three solution functions x(t), y(t) and z(¢) to the linear system of differential equations 
are X[I, 1], X[2,1], and X[3,1], respectively, from above. It is also checked that X(t) is the 
solution to this system of differential equations and satisfies the initial conditions. 


Mathematica can also solve systems of differential equations using the command DSolve. 
Let us use it now to check that our result above is correct. 


654 


RHS = Flatten[{A.{{x|t]}, {y(t]}, {z[t]}}; 

Solns = DSolve[{x'[t]==RHS|[[1]], y'[t]==RHS|[2]], z'[t]==RHS|[3}], 
x[0]==7, y[0]==15, z[0]==21}, {x{t], y[t], z[t]}, t]([1]]; 
Expand|[{x(t], y(t], z[t]} /. Solns] 


{ — 0.058579 e-7-99!7* 4 11.7103 @7 173205" _ 4.65175 @!2-7232¢, 
— 0.07625657799117 t 4. 11.0309 @7 173205" 4. 4,04537 e!27232t 
0.0747012 @ 799117 4. 20.4435 @7 1:73205t 4 9.481794 eames} 
As the final part of this example, let us plot this solution X(f) as a curve in space for t € 


[-0.25,0.25]. This plot will be a parametric plot that gives a spacecurve. The initial 
condition point at Xo is also plotted (see Fig. 12.1). 


Figure 12.1: Solution to the ODE system (12.8), with initial 
conditions given in equation (12.9). 


PlotX0 = Graphics3D[{Red, PointSize[0.03], Point[Flatten[X0]] }]; 


PlotSoln = ParametricPlot3D[{x{[t], y[t], z[t]} /. Solms, {t, —0.25, 
0.25}, PlotRange—All, PlotStyle+{Blue, Thickness[0.007] }]; 


Show[PlotSoln, Plot X0} 


-100 


The last example took a great deal of time and effort since we needed to 
use it to illustrate how several methods might be applied to solve the same 
problem. Happily, in Example 12.4.1, the eigenvalues and eigenvectors 
were all real since the coefficient matrix A was a real symmetric matrix. 
The next example will not be as nice as the last one. We will choose a 
system whose corresponding real-valued matrix A has complex 
eigenvalues and eigenvectors. However, the system of homogeneous 


655 


differential equations will still be completely real, and we still seek a 
completely real set of solution functions. 


Example 12.4.2. Now let us solve the 3 3 square first-order linear system of 
homogeneous differential equations given by the three equations 


— = —2r(t) + 9y(t) — z(t) 


— = —42(t) + y(t) — nz(t) 
dz 
(12.18) a 


where x(t), y(¢) and z(¢) are three unknown functions of ¢ with the initial conditions 


= r(t) — 3y(t) + 4z(t) 


z(0) = —6, y(0) = 10, z(0) = 7 
(12.19) 


This system of three differential equations can be written as the single matrix differential 
equation (12.7), where 


z(t) u d 2 9 -1 
X(t)= | y(t) 13a = g |, A=|-4 1 -r 
ik 5 -3 4 

z(t) E 


The three scalar initial conditions become the single-column vector initial condition 


x(0) —6 
X(0) = | y(0) 10 
z(0) 7 


Now let us have Mathematica compute the eigenvalues and eigenvectors of A. Since A is 
not a real symmetric matrix, just a real 3 x 3 matrix, some of its eigenvalues and 
eigenvectors will be real and the rest must be in complex conjugate pairs since the degree 
of A’s characteristic polynomial is 3, which is odd. 


656 


A = {{-2., 9., —1.}, {-4., 1., —w}, {5., —3., 4.}}; 
(Q = Chop[Transpose[N[Eigenvectors[A]]]]) // MatrixForm 


0.717486 0.717486 —0.624146 
0.221905 + 0.34013 i 0.221905 — 0.34013i | —0.0703089 
—0.420584 — 0.3786721 —0.420584 + 0.378672i1 0.778138 


EigVals = N[Eigenvalues[A]] 
{1.36972+4.7943 i, 1.36972—4.7943 i, 0.26056} 


(Diag = Chop[DiagonalMatrix[EigVals]]) // MatrixForm 


1.36972 + 4.7943 i 0 0 
0 1.36972 — 4.79431 0 
0 0 0.26056 


Chop[Q.Diag.Inverse[Q]] // MatrixForm 


—2. 9. -ł. 
—4. 1. —3.14159 
5. —d, 4. 


A // MatrixForm 


—2. 9. -1., 
=A l. —7 
5 —3. 4 


We see from the last two computations that A is diagonalizable for the Q and D (or Diag) 
defined above. Now we can solve our first-order linear system of differential equations: 


(eDiag = DiagonalMatrix{e™#V** t]) // MatrixForm 


g(1-36972+4.7943 i) t 0. 0. 
0. @(1-36972—4.7943 i) t 0 


0. 0. @0:26056 t 


657 


eAt = Chop[Simplify[Q.eDiag.Inverse[Q]]] 


{{ _ 0.275017 9: 26056 t + (0.637509 — 0.383243 i) e{1-36972—4.7943 ity 


(0.637509 + 0.383243 i) e(1-36072-+44.7943 i) t. 
— 1.26592 e 76°56 + (9.632961 + 0.792179 i) @ft-36972—4-7943 8) t 5 


(0.632961 — 0.792179 i) @(1-36972+4.7943 4) t 
— 1.13708 e% 7696* + (0.568538 — 0.235822 i) e(136972-4.7943i)t | 


(0.568538 + 0.235822 i) @(1-36972+4.7943) pi 


{ — 0.0309802 e° 76°56" + (0,0154901 — 0.420746 i) @(1-36972-4.79434) t 5 


(0.0154901 + 0.420746 i) @f1-36972+4.7943;) t 

— 0.142604e°-26956' 4 (9.571302 — 0.055054 i) (136972-479431) t 
(0.571302 + 0.055054 i) @(?-96972+4-7943) t, 

— 0.12809 e% 26056 + (00640448 — 0.342455 i) ẹ(136972-4-7943i)t 
(0.0640448 + 0.342455 i) e(1-36972+4.7943;) A}, 


{0.342871 @° 26056 _ (0.171435 — 0.561114 i) @(1-96972-4.7943 i) t _ 


(0.171435 + 0.561114 i) e(1:36972+4.7943 1) t 

1.57826 e° 26056 _ (9.789128 + 0.130307 i) e{1-36972—4.79434) t_ 
(0.789128 — 0.130307 i) ẹ(1:36972+4.7943i) t, 

1.41762 e° 26056 _ (0,20881 — 0.438297 i) e(1 36972—4.7943 i) ¢ _ 


(0.20881 + 0.438297 i) g(1-36972+4.7943 i) t } } 


658 


XO = {{—6}, {10}, {7}}; 
X = Simplify[eAt.X0] 
{ { — 18.9686 2°05" + (6.48432 + 8.5705 i) (186972479481) 
(6.48432 — 8.5705 i) p AREN, 
{ — 2.13678 e° 26056 + (6.06839 — 0.423251 i) @(1-36972-4-79431) t 
(6.06839 + 0.423251 i) Renee), 
{23.6487 e°-26056t _ (8,32434 + 1.60168 i) @(1-36972—4.7943 4) t _ 


(8.32434 — 1.60168 i) e(1-96972+4.7048):} } 


Chop[X /. t—0] 
{{—6.}, {10.}, {7.}} 


Our solution functions x(4), y(t) and z(t) must be all real functions since A is real and the 
initial conditions are all real. In order to get real functions from the solution X above, we 
apply the real part command, Re, to the components of X. We must also tell Mathematica 
to assume that ¢ is a real variable, which it is. This is done by using the ComplexExpand 
command in conjunction with Re. We now recall Euler’s formula: 


ett — e° (cos(a) + isin(b)) 


where a and b are both real constants or real functions of t. Also, the real part of the 
product of two complex numbers a + ib and c + id is ac — bd, while its imaginary part is 
ad + bc. The same two formulas work if one of the two factors is not a complex number 
but a complex column vector. 


(XNew = ComplexExpand[Re[Simplify[X]]]) // MatrixForm 


— 18.9686 2° 7608! + 12.9686 @! 9697? t Cos[4.7943 t] + 17.141 e* 9°97? *Sin{4.7943 t] 
2.13678 2° 26056 +. 12.1368 e! 36972 t Cos[4.7943 t] — 0.846503 e! 997? tSin[4.7943 t] 
23.6487 e°: 26056 _ 16,6487 e!3°972* Cos[4.7943 t] — 3.20336 e} 38972 tSin[4.7943 t] 


XNew /. t-0 


{{—6.}, {10.}, {7.}} 


659 


Now we verify that our solution column XNew by the column of these satisfies the 
differential equation along with the initial condition. 


(DXNew = Simplify[D[XNew, t]]) // MatrixForm 


4.94246 2° 26058" 4 99,9425 e@!:36972 t Cos[4.7943 t] — 38.6972 e967? 'Sin[4.7943 t] 
—0.55676 e° 76°56" + 12.5656 e' 96972 * Cos[4.7943 t] — 59.3468 e' 8697? tSin[4.7943 t] 
6.16189 @° 76058" _ 38.1619 e397? * Cos[4.7943 t] + 75.4311 e” 9°97? *Sin[4.7943 t] 


Simplify[A.XNew] // MatrixForm 


4.94246 2° 76056* + 99.9425 @1 96972 t Cos[4,7943 t] — 38.6972 e! 9697? 'Sin [4.7943 t] 
—0,55676 @°:7656* + 12,5656 e1 36°72 t Cos[4.7943 t] — 59.3468 e 8697? ‘Sin [4.7943 t] 
6.16189 @° 76056 * — 38.1619 e' 9°97? t Cos[4.7943 t] + 75.4311 e' 9°97? *Sin{4.7943 t] 


As the next part of this example, let us plot this real solution X(f) as a curve in space for t 


e [-1,1]. This plot, depicted in Figure 12.2, will be a parametric plot of a spacecurve. 
Also, the initial condition point X0 is plotted 


Figure 12.2: Solution to (12.18) with initial conditions (12.19). 


PlotX0 = Graphics3D[{Red, PointSize[0.03], Point [Flatten[X0]] }]; 


PlotSoln = ParametricPlot3D[Flatten[XNew], {t, —1, 1}, PlotRange 
+All, PlotStyle—{Blue, Thickness[0.007]}]; 
Show[PlotSoln, Plot XO} 


We will now have Mathematica solve this system using DSolve. After that, we will take 
the real part of the three solutions and compare the result to X computed previously. 


660 


RHS = Flatten[A.{{x(t]}, {y[t]}, {z[t]}}; 

Solns = DSolve[{x'[t]=—RHS|[1]], y'[t]=—RHS|[2]], z' [t]=—RHS|[3]], 
x[0]==—6, y[0]==10, z[0]==7}, {x{t], y[t], z[t]}, t)[[1]]; 

Chop[{x[t], y(t], z[t]} /. Solns] 


t 18.9686 2°76 4 12,9686 e! 97? * Cos[4.7943 t) + 17.141 e3972 tSin[4.7943t), 
— 2.13678 e™?6056t 4. 12.1368 e196"? *Cos[4.7943 t] — 0.846503 e™ 897?" Sin[4.7943 t], 
23.6487 e708 * _ 16,6487 e 9°97? 'Cos[4.7943 t] — 3.20336 er 90972*Sin (4.7943 t]} 


Flatten[XNew] 


¢- 18.9686 e”7°°55* +. 12.9686 e” 997? Cos[4.7943 t] + 17.141 e "697? "Sin[4.7943 t], 
— 2.13678 e”? t + 12.1368 e997? *Cos[4.7943 t] — 0.846503 e 9°"? tSin[4.7943 t], 
23.6487 e° 7°* _ 16.6487 e997? "Cos[4.7943 t] — 3.20336 e™ 97? *Sin[4.7943 tl} 


Now let us recompute the solution X(f) using 


X(t) = R (a1e™t X; + aze™ Xp + aze™ X3) 


where 


a, Xı + a2X2 + a3 X3 = Xo 


which results in the following matrix equation to find the ax values: 


a2 =R (Q7'Xo) 


X1 = Q[fAn, 1]] 
{0.717486, 0.221905+0.34013 i, —0.420584—0.378672 i} 


X2 = Q[[A1, 2]] 
{0.717486, 0.221905—0.34013 i, —0.420584+0.378672 i} 


661 


X3 = Q[[A1, 3]] 
{ 0.624146, —0.0703089, 0.778138} 


LeftSideSystem = a X1 + b X2 + c X3 


{0.717486 a + 0.717486 b — 0.624146 c, 
(0.221905 + 0.34013 i) a + (0.221905 — 0.34013 i) b — 0.0703089 c, 
(—0.420584 — 0.378672 i) a — (0.420584 — 0.378672i) b + 0.778138 c} 


CoeffsA = Chop|Solve|{LeftSideSystem==Flatten[X0]}, {a,b,c} [{1)]] 


{a > 9.03755—11.9452i, b + 9.03755+11.9452i, c + 30.3914} 
Flatten[Chop[|Inverse[Q].X0)]) 


{9.03755—11.9452 i, 9.03755+11.9452 i, 30.3914} 

ComplexExpand[Re[(a Exp[EigVals[[1]] t] X1 + b Exp[EigVals[[2]] t) 

X2 + c Exp[EigVals[[3]] t] X3) /. Flatten[CoeffsA]]] 

{- 18.9686 e” 76°5* 4 12,9686 e' 9°97? *Cos[4.7943 t] + 17.141 e997? *Sin[4.7943 t], 
— 2.13678 9 76058! + 12,1368 e1097? t Cos[4.7943 t] — 0.846503 e2997? tSin[4.7943 t], 
23.6487 e™?®056t _ 16.6487 e! 9°97? "Cos[4.7943 t] — 3.20336 e' 9°97? t Sin[4.7943 t} 


Flatten|XNew] 

t- 18.9686 e”™?6056t + 12.9686 e997? "Cos[4.7943 t] + 17.141 e 9°"? *Sin[4.7943 t], 
— 2.13678 e°-76°°8* + 12.1368 e567? t Cos[4.7943 t] — 0.846503 e™ °°? "Sin[4.7943 t], 
23.6487 e° 76°55* _ 16.6487 e" 97? t Cos[4.7943 t] — 3.20336 e> 8°"? *Sin[4.7943 tl} 


In our next example of solving first-order linear systems of differential 
equations, we do a nonhomogeneous example. In the nonhomogeneous 
case, the system of linear differential equations is of the form 


dX 
zr = AX(t) + F(t) 


662 


where F(t) is a column of real functions of ¢. In a homogeneous system, 
F(t) does not appear, or equivalently, F(t) is the constant zero column. 


In solving a homogeneous system of linear differential equations, a linear 
combination Y(£) = aXı(t) + bX2(4) of two solutions X1(¢) and X2(f) to the 
system is also a solution, while in a nonhomogeneous system, this is not 
the case. This is the same when solving homogeneous AX = 0 versus 
nonhomogeneous AX = B linear systems of equations, for B # 0. 


Example 12.4.3. Let us solve the 3 x 3 square first-order linear system of 
nonhomogeneous differential equations given by the three equations 


= =5x(t) — 9y(t) + z(t) + sin(t) 
W _ a(t) + 2y(t) + n2(t) — e! 
Z =z(t) + my(t) — 4z(t) + cos(3t) 


(12.21) 


where x(t), y(¢) and z(¢) are three unknown functions of ¢ with the initial conditions 
x(0) = —2, y(0) = 4, z(0) = 10 


This system of three differential equations can be written as a single matrix differential 


equation of the form (12.20), where 
sin(t) —2 
,F(t)=| -¢ |,xX(0)=] 4 
10 


x(t) 5 -9 1 

Xt)=]| yt) |, 4=|-9 2 z 
z(t) 1 vm —4 cos(3t) 

In order to solve this matrix differential equation (12.20) for the column X(f), we first 

look at the simple analogous situation that we already know how to solve, namely 


(12.22) 


(12.23) 


where w(t) and f{f) are normal functions oft and a€l. This first-order linear differential 
equation (12.23) has as its solution 


w(t) = e** feto dt+e*'C 
(12.24) 


663 


where C is an arbitrary constant. You can easily check that this works by plugging it back 
into the differential equation for w(t). 


This solution was arrived at using the method of integrating factors. In case you have not 
seen this method or do not recall it now, let us go through the method. If you rewrite the 
differential equation (12.23) as 


dw 
= T awit) = F(t) 
mia 


then you can see that the LHS of our new equation closely resembles the result of the 
product derivative rule, but it is not quite the same. If you multiply equation (12.25) by 
e “' then we have 


-at dw 


aA ae **w(t) = e~*' f(t) 


(a 


or 


(e-a tw(t)) = e*t f(t) 


d 


by the product derivative rule. To get rid of the dt on the LHS of equation (12.26), we 
must integrate both sides with respect to ¢. This yields 


(12.26) at 


e~*'w(t)= | e~*' f(t) dt+C 


for C an arbitrary constant. On multiplying both sides of this equaion by e”, we arrive at 
the solution given in equation (12.24). 


Now, if the differential equation (12.23) has the initial condition w(0) = wo, then the 
constant C is no longer arbitrary, but a specific value determined by the initial condition. 
So we can determine C’s value by plugging the initial condition information into our 
solution (12.24). This gives 


w(0) = Wo = eno e°? t f(t) dt 4+ e290 
t=0 


e~** f(t) dt +C 
t=0 


Solving for the unknown constant C gives 


664 


C = w - e~** f(t) dt 


t=0 


So, in the case of an initial condition w(0) = wo, the solution to our differential equation 
(12.27) 


(12.23) is 
Pe at at t) dt at | in — ( —at t ar) | 
w(t) =e f: IAEE" |B Je He) t=0 


This tells us that we should expect the matrix differential equation (12.20) with initial 
condition X(0) = [x(0) y(0) 2(0)|" to have the solution 


X(t) =e4"(Z(t) + Po) 
(12.28) 


where 


Z(t) = | e-4*F(t) dt, Po = Xo — Z(0) 
(12.29) = ` 


This solution is correct regardless of whether A is diagonalizable. 


When A is diagonalizable with A = Qe?Q"!, then e“’ = Qe 'Q"!, which allows our 
solution X(f) to be simplified to 


X(t) = Qe? *(W(t) + Ro) 
(12.30) 


where 


(12.31) 


W(t) = | e~?*Q-' F(t) dt, Ro = Q`} Xo — W(0) 


Let us now have Mathematica compute all of this for us by diagonalizing A again (note 
that this is the same A as in Example 12.4.1), and then integrating for us. The integral of a 
matrix M(t) of functions of t is the corresponding matrix of the integrals of M(¢)’s entries. 


Since A is a real symmetric matrix, all of its eigenvalues and eigenvectors will be real. 
The eigenvalues of —A are the negatives of the eigenvalues of A for the same eigenvectors 
of A; that is, if A = QDQ"!, then -A = Q(-D)g!. 


665 


(F = {{Sin[t]}, {—e*}, {Cos[3 t]}}) // MatrixForm 


Sin|t] 
—e! 
| Cos[3t] ) 


À = {{5., —9, 1}, {—9, 2, T}, {1, T, —4}}; 
(Q = Transpose[N[Eigenvectors[A]]]) // MatrixForm 


0.752282 —0.48108 0.450148 
—0.654218 —0.626257 0.42403 
—0.0779158 0.613485 0.785853 


EigVals = N[(Eigenvalues[A]] 
{12.7232, —7.99117, —1.73205} 
(Diag = DiagonalMatrix[EigVals]) // MatrixForm 
12.7232 0. 0. 
0. —7.99117 0. 
0. 0. —1.73205 
Q.Diag.Inverse[Q] // MatrixForm 
5. —9. i 
—9. 2 3.14159 
1. 3.14159 —4, 


A // MatrixForm 


5. -—-9 1 
-9 2 n 
1 m —4 


We see from the last two computations that A is diagonalizable for the Q and D (or Diag) 
defined above. Now we can solve our first-order linear system of differential equations: 


666 


(eDiag = DiagonalMatrix|{e™'#V*"* t]) // MatrixForm 


@!2.7232t 0. 0. 
0. e~ 7-991 17t 0 
0. 0. q@— 1.73205 t 


eAt = Chop[Simplify(Q.eDiag.Inverse[Q]]] 


{ {0.231438 e797" + 0.202633 e-!:73205¢ 4 0,565929 @!?:7282¢, 
0.30128 e~7-99217* + 0.190876 e717205t — 0.492156 127287", 
— 0.2951367799117 t +. 0.35375 e~ +7820 * — 0.0586147 ae 
{0.30128 e-7.9911"* + 0.190876 e~ 1-79205* — 0.492156 e177?" 
0.392198 e-7-99117* 4 0,179801 e7 :73205* + 9.428001 e!27232t, 
— 0.384199 e 799117" + 0,333225 @7 1:7205% 4. 0.0509739 12-7292"), 
{ — 0.295136 e797" + 0.35375 e-*795* — 0,0586147 e127", 
— 0.384199 e~ 799117" + 0,333225 e- 175205" + 0,0509739 e7772, 
0.376364 e~ 799117 +. 0.617565 e +7925" + 0.00607087 1777?" } \ 


(eNegDiag = DiagonalMatrix[e~™¥"* *]) // MatrixForm 


en 12.7232 t 0. 0. 
0. e7991 17t 0. 
0. 0. @)-73205t 


667 


eNegAt = Chop[Simplify[Q.eNegDiag.Inverse[{Q]]] 
{ {0.565929 e~1-72%* + 0.202633 e!7925* + 0.231438 e797", 
— 0.492156 e~ 17-7252 + 0.190876 e797" + 0.30128 e7991 t, 
— 0.0586147 e~ !?-7232* + 0.35375 e} 79705t _ 0.295136e79""78}, 
{ — 0.492156 e179" + 0.190876 e'79295" + 0,30128 e791", 
0.428001 e~ 177232 + 9.179801 e735 + 0.392198 e7 9117", 
0.0509739e~1?-7282" + 0.333225 e178205¢ _ 0.384199679117), 
{ — 0.0586147 e-!?-7232t + 0,35375 e}73205t _ 0,295136 e7-99117¢, 
0.0509739 e~ 17-7232" + 0.333225 e} 73205t _ 0,384199 @7 9917 t, 
0.00607087 e~ !*7252* + 0.617565 e™79?05t + 0.376364 a } 


XO = {{—2}, {4}, {10}} 
{{—2}, {4}, {10}} 


668 


Z = Chop[Expand[Integrate[eNegAt.F, t]]] 


{ { - 0.0419813 e-1"72%* — 0.0698656 e273205t — 0,0335084 e897 t— 
0.0034745 e~ 12-722 * Coit] — 0.0506584 e!-732 t Cos/t]— 
0.00356834 e7291 t Cos|t] + 0.00436427 e~ 17-7282" Cos/3. t]+ 
0.0510594 e!:7205 t Cos[3. t] — 0.0323705 e7:°9!17 tCos[3. t]— 
0.0442069 e~'?-728?* Sin[t] + 0.0877428 e! 79? t Sin[t]+ 
0.0285152 e791!" t Sin{t] — 0.00102905 e7 !2-7252* Sin[3. t]+ 


0.0884376 e™79205t Sin[3. t] — 0.0121524 e791!" t Sin(3. tl}, 


{0.036508 e7117232t _ 9065812 ©? 79295* _ 0,0436203 e8 99117 t+. 
0.00302158 e7 12-7232 Cos[t] — 0.0477192 e173205t Cos|t]— 
0.00464516 e799!!7* Cos{t] — 0.00379536 e7 !?:7282* Cos[3. t]-+ 
0.0480969 e725 Cos[3. t] — 0.0421391 e799"? t Cos[3. t]+ 
0.0384442 e7 12-7282 Sin{t] + 0.0826519 e!-792* Sin{t]+ 
0.0371203 e7 147 t Sin{t] + 0.000894905 e~!?:7252* Sin[3. t]+ 
0.0833064 e?-73205* Sin[3. t] — 0.0158196 e799!" t Sin{3. t] h 

{0.00434811 e7 11-7232 t _ 0,121969 e>73205t + 0,0427307 8 99117 t} 
0.000359863 e~!?:7282" Cos[t] — 0.0884378 e!:7325* Cos|t]+ 
0.00455043 e799117t Cos[t] — 0.000452018 e~ !27232t Cos[3. t]+ 
0.0891378 @1:75705 t Cos[3. t] + 0.0412797 e7 99117 t Cos[3. t]+ 
0.00457862 e~ '?-728?* Sin[t] + 0.153178 e7305 t Sin[t])— 
0.0363633 e™-99™!7 t Sin[t] + 0.000106581 e~ 12-7252 Sin[3. t]+ 


0.154391 e!:75205 Sin[3. t] + 0.015497 e7-99"17* Sin[3. t] i; 


PO = Re[X0 — Z /. t-0] 
{{—1.82}, {4.1201}, {10.0285}} 


669 


X = Chop[ComplexExpand[Re[Simplify[eAt.(Z + P0)]]]] 

{ — 2.13967e-7-9117* 4 3.96521e71-73205¢ _ 9.145355e!'' — 3.64554e!27252t 
0.0577013Cos{t] + 0.0230531Cos(3. t] + 0.0720511Sin[t] + 0.0752562Sin[3. th, 
t 2.78536e 7799117 t 4 3,73514e7173%05t _ 9.9729235e!* + 3.17032e!2:7232+ 
0.0493428Cos|t] + 0.00216249Cos[3. t] + 0.158216Sin{t] + 0.0683817Sin[3. th}, 
{2.7285567791171 + 6.92232e7179205t _ 9.0748903e!-' + 0.377578e!?-7232 t_ 


0.0835275Cos{t] + 0.129965Cos|3. t] + 0.121394Sin{t] + 0.169995Sin{3. t] } 


X /. t-0 
{{-2.}, {4.}, {10.}} 


DX = Simplify[D[X, t]] 
{{17.0984e-79917" — 6.86793e7 173205 _ 9,145355e!:* — 46.383e!2-7292t 4 
0.0720511Cos|t] + 0.225769Cos|3. t] + 0.0577013Sin{t] — 0.0691594Sin[3. tl}, 
{22.2583677 9117: — 6.469447 178205t _ 9.0729235e"* + 40.3367e1?-7797* + 
0.158216Cos|t] + 0.205145Cos[3. t] + 0.0493428Sin{t] — 0.00648748Sin[3. t}, 
{- 21.8043e~799117" _ 11,9898e7!-73205t _ 9.0748903e':* + 4.804e!?-7282t 4 


0.121394Cos[t] + 0.509985Cos/3. t] + 0.0835275Sin{t] — 0.389896Sin{3. t] } 


Simplify[A.X + F] 
{{17.0984e77 117: — 6.86793e7!-73205' _ 9.145355e!:t — 46.383e!27292t 4 
0.0720511Cos|t} + 0.225769Cos|3. t] + 0.0577013Sin{t] — 0.0691594Sin|3. tj}, 
{22.2583e-7.9917" — 6.46944e71-75205t _ 0.0729235e t + 40.3367e!? 7282 t 4} 
0.158216Cos|t] + 0.205145Cos|3. t] + 0.0493428Sin|t] — 0.00648748Sin|3. tl}, 
{ — 21.8043e~7-9117t _ 11 gggge~1-73205¢ _ 9.9748903e!"* + 4.804e!?-7232t + 


0.121394Cos|t] + 0.509985Cos|3. t] + 0.0835275Sin{t] — 0.389896Sin(3. t] N 


Our three solution functions x(¢), y(t) and z(£), to the linear system of differential 
equations, are X[[1]], X[[2]], and X[[3]], respectively, as seen in the variable X above. We 
also checked that X(f) is the solution to this system of differential equations and satisfies 


670 


the initial conditions. Let us plot, as shown in Figure 12.3, this real solution as a curve in 
space for t e [—0.25,0.25]. 


Figure 12.3: Solution to the ODE system (12.21), with initial 
conditions given in equation (12.22). 


PlotX0 = Graphics3D[{Red, PointSize[0.03], Point[Flatten[X0]]}}; 


PlotSoln = ParametricPlot3D[Flatten[X], {t,—0.25,0.25}, PlotRange 
+All, PlotStyle+{Blue, Thickness[0.007]}]; 


Show[PlotSoln, Plot X0} 


Let us use DSolve to check again that our solution X, as given above, is correct. 


RHS = Flatten[A.{{x[t]}, {y[t]}, {z[t]}} + F]; 

Solns = DSolve[{x' [t]=—=RHS|[(1]}, y'[t]=—RHS|[2]], z'{t]=—RHS|[3]], 

x(0] == —2, y[0] == 4., z[0] == 10.}, {x[t], y(t], z[t]}, t)[[1]]; 

Chop[TrigReduce[ComplexExpand[Re({x{t], y(t], z[t]} /. Solns]]], 

107] 

{ — 2.13981e7 7-997 +. 3.965257 !-75205t _ 9.145287! — 3.64566e!?-7232t 
0.0577025Cos{1.t] + 0.0230067Cos|3.t] + 0.072058Sin{1.t] + 0.0752562Sin[3.t], 
— 2.78554e7 799117 4. 3,73518e7173205t _ 9 9730616e!"* + 3.17043e!2:7252t 
0.0493405Cos{1.t] +0.00215926Cos|3.t] +-0.158215Sin[1.t] +0.0683817Sin{3.t}, 
2.72873e7 799117 + 6 9224e—1-73205t _ 9 9749271e!* + 0.377591e!?-7232t — 


0.083528Cos|1.t] + 0.129907Cos[3.t] + 0.121393Sin{1.t] + 0.169995Sin(3.t]} 


671 


x 
{{- 2.139677799117 + 3.96521e7179205¢ _ 0.145355e!"* — 3.645541272921 


0.0577013Cos{t] + 0.0230531Cos{3. t] + 0.0720511Sin{t] + 0.0752562Sin[3. th}, 
{ 2.78536e~7-99117t + 3.73514e71-75205t _ 9.9729235e!:* + 3.17032e!2:7232*_ 
0.0493428Cos{t] + 0.00216249Cos|3. t] + 0.158216Sin{t] + 0.0683817Sin{[3. tH}, 
{2.72855¢-79 17t 4 6.92232e773205t _ 9.9748903e!:* + 0.377578e!2-7232t 


0.0835275Cos|t] + 0.129965Cos(3. t) + 0.121394Sin{t] + 0.169995Sin[3. t] X 


In the last example, we will look at solving a single kth-order constant 
coefficient linear nonhomogeneous differential equation of the form 


d* d*®-1 
dt 


d 
ak ++ ace + aoy = f(t) 


(12.32) dt 


for real constants ax—1, ak-2, a0, with the initial conditions 


dy dk—ly 
y(0) = bo, == (0) = bi, ..., =p —(0) = bk-1 
(12.33) dt dtk=1 


Here we will assume that each of the b; scalars are real as well. The single 
equation (12.32) can be written equivalent ly as a nonhomogeneous 
first-order linear system of differential equations in the new k unknown 
functions 


d d*-1 
z(t) = y(t), z(t) = A, AASS x(t) = aL 


The nonhomogeneous first-order linear system of differential equations is 
then 


672 


(12.34) 


= = x(t), = Phen Bs ai 
T = — (apr, (t) + arT2(t) +- --+ak—-ıTk(t)) + f(t) 


with the initial conditions 


21(0) = bo, z2(0) = b1,..., e(O) = bk-1 
(12.35) 


Example 12.4.4. The nonhomogeneous constant coefficient differential equation that we 
will solve is the third-order equation 


dy „dy „dy 
— +5—5 — 2— + 3y(t) = sin(t) 
(236 W eee ae 


with the initial conditions 


dy dy 
y=4, 20) =-9, [2% =7 
(12.37) dt dt? 


This third-order equation becomes the first-order linear system (12.20), where 


x(t) 0 1 0 0 4 
x=| 80 |a=] P | o |æ- 
za(t) -3 2 -5 sin(t) 7 


d =. z 
for z(t) = y(t) r2(t) = E „a E3(t) = BE kc scion 


formula that we shall use here is (12.30) with (12.31) since A is diagonalizable. We will 
employ this formula since it is hoped that this form of the solution will not give us the 
superfluous terms that the general formula does with unnecessary, very small 
coefficients. 


(F = {{0}, {0}, {Sin[t]}})//MatrixForm 


0 
0 
Sin{t] 


673 


A = {{0, 1, 0}, {0, 0, 1}, {—3, 2, —5}}; 
(Q = Transpose[N[Eigenvectors[A]]]) // MatrixForm 


0.033467 —1.46118 — 1.088571 —1.46118 + 1.088571 
—0.18294 0.424803 — 1.281261 0.424803 + 1.28126 i 
l. Ls $; 


EigVals = N[Eigenvalues[A]] 
{—5.46628, 0.23314+0.703182 i, 0.23314—0.703182 i} 


(Diag = Chop[DiagonalMatrix[EigVals]]) // MatrixForm 


—5.46628 0 0 
0 0.23314 + 0.703182 i 0 


0 0 0.23314 — 0.703182 i 


Chop/Q.Diag.Inverse[Q]} // MatrixForm 


0 1 O 
8 8 L 
-3. 2. —65. 


A // MatrixForm 


0 1 0 
0 0 1 
-3 2 =ý 


We see from the last two computations that A is diagonalizable for the Q and D (or Diag) 
defined above. Now we can solve our first-order linear system of differential equations: 


(eDiag = DiagonalMatrix|e"'*¥*"* *}) // MatrixForm 


e—>-46628 t 0. 0. 
0. @(9.23314+0.703182 ijt 0. 
0. 0. p(0-23314—0.703182 ijt 


674 


QeDiag = Chop(Q.eDiag] 
{ {0.033467 e75-46628, (—1.46118 — 1.08857 i) e(°-29914+0.7091824) t, 
(1.46118 + 1.08857 i) (© 28914-0.7091824) } 
{ — 0.18294 @75-46628t, (0,424803 — 1.28126 i) ẹ(%25314+0.7031821) t, 


(0.424803 + 1.28126 i) amenan, 


{1. @5-46628t 1, 9(0.23514+0.703182i) t L.eteoma-oroneasis} 


(eNegDiag = DiagonalMatrix|e—*'*¥*"s t]) // MatrixForm 


@-16628 t 0. 0. 
0. e(—9.23314—0.703182 ijt 0 


0. 0. q@(—0.23314+0.703182 ijt 


eNegDiagQInv = Chop[eNegDiag.Inverse(Q]] 
{ {0.497268 e546628t _ 499482 @°4628¢ (9. 906069e546628 3, 


{(—0.248634 + 0.117935 i) e(-0.28814-0.7081823)¢, 


(0.211241 + 0.290042 i) e[—0-25314—-0.703182 i) t 
(0.0469654 + 0.0491132i) e(-028514-0.7081825)¢}, 


{ (—-0.248634 — 0.117935 i) g{—9-23314+0.703182 i) t 
(0.21 1241 — 0.290042 i) g{—0-23314+0.703182 i) t 
(0.0469654 — 0.0491132 i) e(-9-23314+0.703182 4) ‘ht 


675 


YO = {{4}, {—9}, {7}}; 
W = Expand[Chop[Integrate[eNegDiagQInv.F, t}]] 


{ { — 0.0293414 e°4%** Cosft] + 0.160388 e°4°8* Sin{t \, 
{(—0.100714 — 0.028739 i) e(—0-28814-0.703182 1) t Cogit)— 
(0.0032711 + 0.0775206 i) e~%23314-0.703182) t gin [t] \, 
{(-0.100714 + 0.0287399 i) e(-9-28814+0.7031821)t Cogjt]— 


(0.0032711 — 00775206 i) e(—0-23314+0.7031824) t Sin(t]}} 


RO = Chop{Inverse[Q].YO — W /. t—0] 
{{12.1632}, {—2.46623—1.7661 i}, {—2.46623+1.7661 i}} 


Y = Simplify[(QeDiag.(W + R0)] 
{ {0.407067 ©5498" + (1.68108 — 5.26526 i) @(28914-0-7091825) t4 
(1.68108 + 5.26526 i) @(°-28914+0.7051824)+ + 
(0.230769 + 0. i) Cosft] — (0.153846 + 0. i) Sin{t]}, 
{ — 2.22514 e7 °-46628* _ (3.31051 + 2.40965 i) (0 25514—0.7081824) t _ 
(3.31051 — 2.40965 i) @(9-23914+0-7031824) t_ 
(0.153846 + 0. i) Cos{t] — (0.230769 + 0. i) Sin(t]}, 
{12.1632 @7546628t _ (2,46623 — 1.7661 i) e(°-28814-0-708182;) t _ 
(2.46623 + 1.7661 i) e(0-23314+0.703182 +t 
(0.230769 + 0. i) Cos[t] + (0.153846 + 0. i) Sin(t]}} 


676 


Chop[Y /. t-+0] // MatrixForm 


4. 
—9. 
7. 


Our solution y(¢) to the original third-order constant coefficient differential equation is the 
first entry in Y above, which we denote as Y1 below. We check that Y1 satisfies the 
differential equation and its initial conditions. As well, we verify that it is correct by 
solving the differential equation with DSolve. Since DSolve gives us a complex result, 
we must take its real part. In Figure 12.4, we plot the solution function y(4) for t € 
[-0.5,5.0]. 

Y1 = ComplexExpand[Re[Y[[1]]]] 

{0.407067 @7546628t + 3,36216 e™?3314t Cos[0.703182 t]+ 


0.230769 Cos|t] — 10.5305 e”?3314t Sin[0.703182 t] — 0.153846 Sin|t]} 
{Y1, D[¥1, t], D[¥1, {t, 2}]} /. t0 


Chop|Simplify[D[Y1, {t, 3}] + 5 D[Y1, {t, 2}] — 2 D[Y1, t] + 3 Y1]] 


{1. Sin{t}} 

Soln = DSolve[{y''[t] + 5. y"[t] — 2 y'[t] + 3 y(t] == Sin[t], y[0] 
== 4, y'[0] == -9, y"[0] == Th yit), t]; 
Chop[Expand[TrigReduce[ComplexExpand[Refy[|t] /. Soln]]]]] 


{0.407067 @7546628t + 3.36216 e”?3314t Cos[0.703182 t]+ 
0.230769 Cost] — 10.5305 e® 24914 t Sin(0.703182 t] — 0.153846 Sin|t] } 


Figure 12.4: Solution to the ODE (12.36), with initial conditions 
given in equation (12.37). 


677 


Plot[Y1, {t, —0.5, 5}, PlotStyle+{Blue, Thickness[0.007] }] 
vn) 


The last part of this section will summarize what we have discovered 
about solving homogeneous and nonhomogeneous first-order square linear 
systems of differential equations. But it is more than a summary, since it 
will also recap what we have done in a purely matrix approach based on 
the diagonalizability of the coefficient matrix A. 


In this final portion of this section we also have formulas that solve the 
homogeneous and nonhomogeneous problems that do not depend on the 
diagonalizability of the matrix A. However, without this diagonalizability, 
we may have great difficulty solving the problem unless 4 is of some 
special form, such as upper or lower triangular. 


We begin with the homogeneous case (12.7), which we rewrite for 
completeness, along with the initial condition 


—— =AX(t), X(0) = Xo 
(12.38) 


where A is an n x n square matrix, and X(f) is an n x 1 column matrix, 


which we can also think of as a vector in R” or C”. As usual, we assume 
that A is diagonalizable with A = ODO, for D the diagonal matrix of A’s 
eigenvalues (1, A2,..., An, and Q an n x n matrix of A’s eigenvector 
columns X}1,X2,...,.X» corresponding to these eigenvalues for which Q is 


invertible; that is, these n eigenvector columns of Q must be an 


independent set in ©”. This diagonalizability also says that e = 


oe! om 


678 


We have seen previously that all of the solutions X(f) to this system can be 
written as 


(12.39) X(t) = aye™ tX + age*?*Xo +--+ + ane Xp 


for arbitrary constants a1,a2,...,an if we have no initial condition. This is 
called the general solution to our system of differential equations. If we 
have the initial condition X(0) = Xo, then these arbitrary constants have 
unique values given by 


a) 
a2 


=Q Xo 
Qn 
a, X, + a2X2 +--+: + ünXn = Xo 


The general solution given in (12.39) can be written as the alternative 
matrix equation 


ay ay 

a2 a2 
X(t) = Qe?" =e'Q 

an an 


In order to see that this is true, we could multiply out this product of 
matrices or simply check that it satisfies our system of differential 
equations without an initial condition. We leave it to you to do this 
multiplication as your check. Let us now perform our check. For 


ay 
az 
X(t) = Qe”! 
ûn 
(12.40) 
we have 


679 


EX QDe” 
dt 


a n 


which follows from the fact that 


d Dé Dt 
ae = De 


and is left as an exercise for you to verify. Now 


ay a) 


a2 a2 
AX(t)=AQe™ | | | =QDe"™| . or ai 


an an 


since AQ = QD from A’s diagonalizability, although the equation AO = 
QD is true even if Q is not invertible. 


If we have the initial condition X(0) = Xo, then the general solution given 
in (12.39) at t = 0 yields 


ay ay 

ag ag 
Xo=X(0)=Qe?®| E | =Q| . 

an an 


since the exponential of the zero matrix is the identity matrix. Solving for 
the unknown column matrix of the ax terms gives 


680 


a2 


=Q7'Xo 


an 


It follows that our unique solution is 
X(t) = Qe”!Q-! Xo 


to our homogeneous system under the initial condition X(0) = Xo Of 
course, e= Oe? ‘ol, and so this solution can also be expressed as 


X(t) = e Xo 


which is of the same form as the solution to the simple differential 
equation given in (12.10). Do not forget to take the real part of this 
solution X(f) as your real answer to the problem when A is real and the 
initial conditions are also real. 


After working through examples of the homogeneous case, we looked at 
an example of a nonhomogeneous linear differential equation. This was 
given by 


aX = AX(t)+ F(t), X()=Xo 
(12.41) dt 


In order to find the general solution to the nonhomogeneous case [i.e., in 
which we do not consider the initial condition X(0) = Xo], when A is 
diagonalizable, we use the method called variation of parameters, which 
says that we should seek a general solution X(f) to the nonhomogeneous 
case of the form given in equation (12.40), and modified as 


681 


X(t) = Qe” 


(12.42) 


where we have taken the general solution of the homogeneous case and 
turned its arbitrary constants into unknown functions of ¢. If we compute 
the derivative of this potential solution function X(f), we get from the 
product rule that 


a;(t) day 

l PT (t) 

a2(t) daa (4) 
= = QDe”' i Qe”! 
„(t dan 

an(t) rt) 


dX 
Setting dt equal to AX()+F(À to satisfy (12.41)gives. 


a;(t) = a; (t) 
a(t) daz |, a(t) 

QDe”* + Qe” dt = AQe”™ + F(t) 
an(t) =n an(t) 


However, since AO = QD, the first term on the left cancels with the first 
term on the right in the equation above, resulting in the simpler equation 


682 


Qe”! dt az F(t) 


Solving for the column matrix of a,(f) derivatives gives 


da, 
— (t 
a È) 


daz 
——{2) 
dt = eP*Q-' F(t) 


day, 


7 Y 


Clearly, it is time to integrate in order to find the unknown functions a1(¢), 
a2(f),...,an(t), and when you integrate a matrix of functions, you integrate 
its entry functions. So 


ai (t) 


az(t) 
= [a Faa +C 


an(t) 


where C is a column of n arbitrary constants that can be used to satisfy an 
initial condition, when given. 


Let Z = Je P'o FOdt. Then the general solution X(£) to our 
non-homogeneous system of differential equation is 


683 


a Dt Y 
aza XO = Qe Z) +C) 


for C a column of n arbitrary real constants. This general solution can also 
be written as 


X(t) =e“ [feroase] 


since e“ = Qe”! oa! 


If our nonhomogeneous system of differential equations has the initial 
condition X(0) = Xo, then C is no longer arbitrary but a particular column 
of real numbers. In order to find this particular column C, we plug ¢ = 0 
into our general solution for X(t). This gives us 


Xo = X(0) = Qe?°(Z(0) + C) = Q(Z(0) + C) 


since the exponential of the zero matrix is the identity of the same size. On 
solving for C, we have 


—~ O-l x, — 
a24) FTR %o— 20) 


So the solution to the nonhomogeneous system of differential equations 
(12.41), including the initial condition, is now given by 


24s) *) = Qe?* (Z(t) + Q7' Xo — Z(0)) 


This form of the solution will avoid the creation of superfluous terms and 
so should be used when 4 is diagonalizable. Do not forget to take the real 
part of this solution X().as your real answer to the problem when A is real 
and the initial conditions are also real. 


We can write this solution, using the original matrix A, as 


X(t) = e4* [J e^t F(t) dt + Xo — W (0) 


684 


where W(t) = le“ F(pdt. This form of the solution should be used only 
when A is not diagonalizable since it creates superfluous terms in the 
solution with very small coefficient values; these superfluous terms occur 
since oo! ~ In due to the fact that Q is not exact and we get very small 
values instead of zero in this identity matrix. 


This concludes our application of eigenvalues and eigenvectors in solving 
square homogeneous and nonhomogeneous first-order linear systems of 
differential equations with initial conditions when the coefficient matrix A 
is diagonalizable. You are asked in the exercises to look at a problem 
where A is not diagonalizable but is upper triangular of a form where ef 
can be computed. In the research projects, you are asked to study the 


situation where A is not a constant matrix but its entries are functions of t. 


In Section 12.5, we will focus on a collection of important facts about 
eigenvalues and their eigenvectors with proofs for most of this 
information. 


Homework Problems 


(Note: It is permissible to use Mathematica to find the matrices D and Q, 
when the coefficient matrix A is diagonalizable, if the system is larger 
than 2 x 2.) 
1. Find the real solutions to the homogeneous first-order system of 
linear differential equations given by 


= = 2r(t) + 5y(t) — z(t) 
dt 
di 
i = 5a(t) — 3y(t) + 82(t) 
a = ~ H(t) + 8y(t) — 72(t) 


where x(f), y(À, and z(t) are three unknown functions of t with the 
initial conditions x(0) = —12, y(0) = 4, and z(0) = 7. 

2. Find the real solutions to the nonhomogeneous first-order system 
of linear differential equations given by 


685 


2x(t) + 5y(t) — z(t) + cos(t) 


dt 
4 = 5r(t)— 3y(t) + 8z(t) — sin(t) 
Z = —2(t) + 8y(t) — 72(t) + e 


where x(t), y(À, and z(f) are three unknown functions of t with the 
initial conditions x(0) = —12, y(0) = 4, and z(0) = 7. 

3. Find the real solutions to the homogeneous first-order system of 
linear differential equations given by 


= = Talt) —5y(t) 
dy _ 
= 152(t) — 3y(t) 


where x(f) and, y(f), are two unknown functions of ¢ with the initial 
conditions x(0) = 8 and y(0) =-1. 

4. Find the real solutions to the nonhomogeneous first-order system 
of linear differential equations given by 


a = 7x(t) — 5y(t) +t? 
4 = 15z(t) — 3y(t) — cos(4t) 


where x(f), y(t), and z(f) are two unknown functions of ¢ with the 
initial conditions x(0) = 8 and y(0) = —1. 

5. Find by hand, using backward substitution, the real solutions to 
the homogeneous first-order system of linear differential equations 
given by 


686 


T = 5e(t) + 2y(t) 


dt 

dy _ 

q ~ ult) 
dz 

q 7 59 


where x(t), y(À, and z(¢) are three unknown functions of t with the 
initial conditions x(0) = 2, y(0) = —1 and z(0) =3. 

6. The upper triangular coefficient matrix A of the system from 
problem 5 is not diagonalizable. Can you use the methods of this 
section to get the solution to this system of differential equations? 
You will need to find both e“’ and e~. If yes, do so and compare 
what you get here with the results of problem 5 to see if it has 
worked. If no, explain why not. 

7. Find using matrix methods (check with DSolve) and plot from t 
e [+1,1] the solution function y(t) to the fourth-order constant 
coefficient nonhomogeneous linear differential equation 


d'y dy dy., 
det eo 8 +5 Tt + 2y(t) = cos(t) 


with the initial conditions 


d?y 
dt? 


d A 


y(0) = -1, (0) = 10, dy 


—5 (0) = 7, dt? — (0) =3 


hp o. 
8. Show that dt is a correct derivative rule for A any 
real diagonalizable matrix. If the matrix Æ is not diagonalizable, 
then check that this derivative rule is still correct using the 
definition of the exponential of a square matrix as an infinite sum. 
You may use 


r ea z 1 
<> = (At)" =o tas 
n=0 n=1 ( ) 


687 


but you should justify why this derivative formula should be 
correct. 


dX 
9. For the nonhomogeneous system dt = AX(t) + F(d) of first-order 
linear differential equations, we found that the general solution to 
this system is 


X(t) = e^ (fer dt + c) 


Now verify this directly by computing the derivative of this solution 
and seeing that it satisfies the equation. 


Mathematica Problems 


1. Compare your answers to homework problems 1, 2, and 5 using 
matrix methods to the answer from DSolve. Plot the solution as a 
parametric space curve with the initial condition point. 

2. Compare your answer to homework problems 3 and 4 using 
matrix methods to the answer from DSolve. Plot the solution as a 
parametric curve with the initial condition point. 


Research Projects 


1. For the homogeneous system 


dX 
= = AX(t) 


of first-order linear differential equations, we found that the general 
solution to this system is 


X(t) =e“*C 
where C is an arbitrary constant column. If we change this system 


to one in which the real square matrix A is no longer constant but a 
matrix function 4(f) instead, then we have the new system 


688 


dX 
a = AW X(t) 


Prove or disprove that this system has the general solution 
X(t) = ef AAG 

If it does work, then check it on an example. 

2. For the nonhomogeneous system 

dX 


Gr = AX(t) + F(t) 


of first-order linear differential equations, we found that the general 
solution to this system is 


X(t) = e4* (J e` *"R(t) dt+C ) 


where C is an arbitrary constant column. If we change this system 
to one in which the real square matrix A is no longer constant but a 
matrix function A(A),then we have the new system 


dX | 
Gy = Al) X(t) + F(t) 


Prove or disprove that this system has the general solution X(f) 


X(t) = ef Ae ( fotot Pars c) 


If it does work, then check it on an example. 

3. (a) Is there a way, using matrix methods, to solve the following 
system of nonhomogeneous constant coefficient linear differential 
equations 


689 


dx dx dy 
de = 8 + 37 + 4x(t) — y(t) = sin(t) 
d'y dz dy ! 
—2 + 7— — 27 + 9r(t) — 3y(t) = cos(t 
dt? “dt dt Feat) ake ai 


dz dy 
with initial conditions x(0) = —6, y(0) = 2, dt (0) = 1, and dt (0) = 
—5? If yes, then solve this problem for its two real solutions x(t) 
andy(f), If no, then explain why not. 
(b) Can DSolve solve this problem? 
(c) If the answer to part (a) was yes, then generalize your 
method to any-size square system. 


12.5 Basic Facts about 
Eigenvalues, Eigenvectors, 
and Diagonalizability 


Now we turn to some purely theoretical material concerning eigenvalues, 
eigenvectors, and diagonalizability. The few facts that we give without a 
proof would require a great deal of material to be developed for their 
proofs, and so we leave their proofs to a future course in advanced linear 
algebra if you should decide to take one, or as a project to satisfy your 
curiosity. 


Let us begin with some of the most basic facts and then move on to the 
more complicated ones. In all that follows, let A be a square n x n matrix 
with all complex entries unless otherwise specified. The typical case in 
many applied situations is that A has all real entries, but complex 
eigenvalues and eigenvectors. 


Theorem 12.5.1. Let A € C""" andà e C be an eigenvalue of A. Also, 


~ Ti p= 
let È> be the subset of ©" consisting of 0 and all the eigenvectors z of 
A for the eigenvalue x: 


690 


Ei = {7 eC" | Ar =Az} 


Then E is a subspace of ©", where Ey, NE), = { 0} when MF d2. 
Also, 


1 < dim(E,) < My 


where Mì is the algebraic multiplicity of 4 as a root of A’s characteristic 
polynomial P(A). 


Proof. Let us show that E, is a subspace of C” for any eigenvalue of A. 


Since E, is a nonempty subset of the vector space € over the field of 
complex numbers, all we need to show is that any linear combination of 


the elements of E, is also in E, In other words, we need to verify that 
for any two vectors 7, y E€ E,, and a, b e C”, we have that 
a? +b 7 E E) tt is sufficient to show that 


Ala? + bi) = Max + b7) 


Now 


Alat + by) 


a(AZ’) + (AZ) 
= a(A) + b(AV) 
Maz +by7) 


With this simple matrix calculation, we have demonstrated that E, isa 
subspace of C”. 


Let 414 A2 be two different eigenvalues for the matrix A. Assume that 
En, NEn #{0} 

Ai Aa 7, , Then there exists some nonzero eigenvector 

pi € Ey, NE), So Ag =I T, and AT = 2 X. When we combine 

+ 

these two equations, we have Az z5 MT, or (Ay — Aa) zg=0 ; 


691 


—> 
Since Zz + 0 , we must have that 41 -A2 = 0 or M= Az. This is a 
—> 
Ex, NE), = { 0 } 


contradiction, and so 
Next, we prove the second part of the theorem, namely, that 1 < dim (E,) 
< M Clearly, 1 < dim (E), since only the subspace consisting of the 


zero vector has dimension zero. Also, E, + { 0 }, since E must contain 


— 
an eigenvector T +0, corresponding to the eigenvalue à. The proof of 


dim (E,) < Ma, not be given in this text. (If you combine Theorems 12.5.3 
and 12.5.7 below, you should begin to see why this is true. It is left as a 
project for you to see how these two facts, with other information, can 
complete this proof.) 


Next, we look at two facts that relate the eigenvalues of A to the 
determinant of A, and a fact about upper and lower triangular matrices’ 


eigenvalues. In all that follows in this section, we let 4 € CC” unless 
otherwise stated. 


Theorem 12.5.2. The determinant of A is the constant term of A’s 
characteristic polynomial 


Proof. The characteristic polynomial of A is P(A) = det(A — Ay). If you 
expand out this determinant, you will get 


(12.46) 

P(X) = (—1)"A” + Ay,— A” = + ret, aa +--+ aÀ + ag 
where the ax terms are the coefficients of P(A). Then, plugging in 0 for À 
in this polynomial gives 

P(0) = det(A — 0 - In) = det(A) 

by its definition, and P(0) = ao in the expansion given in (12.46). 
Therefore, we have that ao = det(A). 


Theorem 12.5.3. The determinant of A is the product of A’s eigenvalues 
where each eigenvalue is a factor in this product Mj times, where M) is 
the algebraic multiplicity of 4 as a root of A’s characteristic polynomial. 


692 


Further, the sum of the multiplicities of A’s distinct eigenvalues is n. In 
general, the algebraic multiplicity of a root of a polynomial is I and the 
polynomial has as many distinct roots as its degree. 


Proof. Now, let us factor completely the characteristic polynomial as it is 
expressed in (12.46). We get 


k 
P(A) = (-)" [Ja -a 
j=l 


where A1, A2...A,% are the k distinct roots of P(A) corresponding to the k 
distinct eigenvalues of A. Their respective algebraic multiplicities are the 


positive integers Mayis Maasi- My, Then, since the characteristic 
polynomial P(A) has degree n, we must have 


My, +My t-e M), =n 


Now, from the Theorem 12.5.2, we have 


k 
det(A) = P(0) = (-1)" [](-a;)"™ 
j=! 
k M 
=(P aM, TTA, 


j=1 


So the determinant of A is the product of A’s eigenvalues, where each 
eigenvalue is a factor in this product M, times, and Mj is the algebraic 
multiplicity of as a root of A’s characteristic polynomial. 


693 


Theorem 12.5.4. If A is a lower or upper triangular matrix, then A’s 
eigenvalues are its diagonal entries and its characteristic polynomial is 


n 
P(A) = [[ (Arx - A) 
(12.47) ii 
where A’s diagonal entries are A\,|, A22, ....Ann. 


Proof. In order to compute A’s characteristic polynomial 
P(A) = det(A — AI,) 


when A is upper triangular, expand along the first column of A — Alp to 
findthis determinant and then continue expanding along first columns at 
each future determinant computation. This gives exactly (12.47). From 
this factoringof the characteristic polynomial we have that A’s diagonal 
entries are its eigenvalues. If A is lower triangular, then expand along rows 
instead of columns to find the characteristic polynomial. 


Now let us turn to facts related to the diagonalizability of A, and when A is 
not diagonalizable. 


Theorem 12.5.5. Let A be diagonalizable with A = opg! Then A and D 
have the same characteristic polynomial P(A). 


Proof. Let A = opa", and define P4(A). = det (.A — à In) and Pp(A) = 
det(D- ùn). Then 


Pa(A) = det (A — À In) 

= det (QDQ™! — A In) 

= det (Q (D — A In) Q7’) 

= det (Q) det (D — À In) det (Q7?) 
det (Q) det (D — A In) det (Q)~" 
= det (D — A In) 
= Pp(A) 


694 


So A and D have the same characteristic polynomial P(A). 

In particular, we now have det(A) = det(D) by Theorem 12.5.3. Also, as a 
consequence of Theorem 12.5.4, we have that each distinct eigenvalue AK 
of A must appear on the diagonal of D exactly My, times for Ma, the 
algebraic multiplicity of àg as a root of A’s characteristic polynomial P(A). 


Before we can discuss the conditions under which the square matrix A is 
or is not diagonahzable, we need to know something about the 
independence of a set of eigenvectors of A with one eigenvector chosen 
for each eigenvalue of A. This will relate to the relative ability of the 
matrix Q to be invertible in our diagonalizability discussion for A. 


Theorem 12.5.6. Let 1, M2, ..., Ak e E be the k distinct roots of the 
characteristic polynomial P(\).of A; that is, they are the k distinct 
ae. -= 
eigenvalues of A. Now let ¥1;T2,-.-,Tk be k eigenvectors of A 
corresponding to the k distinct eigenvalues of A. Then the set 
ea iai : h 
{Ti T2. Tk} is an independent subset of C”. 
Proof. ie will show by finite induction on the size s of the subset 
-> — > -> 
Ener s Tjg1-+++ Tj, } that this full set {£1 T2, ---, Tk} is independent. 
-— 
In other words, we will show that if oe = 1, then {Tj } is independent, and 
if for all s < k and \Tj1> fja = Tie) is an penen for all subsets of 
size s or less, then {Eji Tja e T ai rija» j Tat} | is also independent for 
any subset of size s + 1 of {z} t aa TÈ} From this, it would follow 
+>) = . 
that {T1,£2,---,Zk} must be independent. 


— 
When s = 1, the set {Tj} is independent since any nonzero vector 
automatically forms an ep set. Next we let s < k and assume that 


er ae Tied i 

the set 131)" J2)** Js J is ind pendent for all subsets with s distinct 

elements from the set E E T3 23e T Tk} . Now we need to show that any 
ee a ss A z}. 

subset of s + 1 distinct elements of {£1+%2,---;Tk} is also 


independent. Assume that {z; 19 Vjgrs ++ 9 Pjas Cjapi } is a dependent set 
for some subset of size s + 1. We seek a contradiction that then forces our 
subset to be independent. In order to avoid going blind because of this 


695 


double subscripting, without loss of generality we will say that our 
=> 


dependent subset is F1: T2,- - Ts, Ts4+1 } This dependence implies 
that there exist complex numbers b1, b2, bs, bs+1 with at least one bk 
nonzero such that 


bT? + boF} +--+ +b, + 6417094 = 0 
(12.48) ndiii 
Now, if any of these b values is zero, then we have remaining a linear 


combination of s or less of our Tk vectors that equals 0, which forces all 
remaining bg values to be zero by the independence of any subset of the 


=F i 
Tk vectors of size s or less. Hence, all the bg values must be nonzero. 


Now multiply equation (12.48) by the matrix A, and apply the fact that 
> -> 
ATE = ÀkTk to get the new equation 


(12.49) 


biT? + b2A2T3 + +++ + ba AsTs + bs41As41Tapi = 0 


Since the eigenvalues we are using are all distinct, we cannot have two of 
them that are both 0. There must be at least one nonzero eigenvalue in the 
set {A1, À2, ..., As,Ast1} since s is at least 1. Let us say that As+1 # 0 for 
convenience. 


Let us now look closely at the two equations (12.48) and (12.49). We can 


solve each of these two equations for the last eigenvector, Ts+1, since the 
coefficients,bs+1 and bs+1As+1, are both nonzero, as we have already 
argued. Equation (12.48) yields 
1 =>. p= 
Teyi = -g (biti + b272 ++ brs) 
s+1 


while equation (12.49) gives 
1 


— (by \, 1 + boA2Ts + +++ + ba Asis) 
bs4 1As+1 


— 
L341 = — 


696 


TER, 

Now the set 171: T2,- » -+ Ts f has s elements, and thus is independent. If 
two linear combinations of an independent set’s elements are equal, then 
all of their coefficients must equal for corresponding vectors in the set. 


This is our situation, which forces the two coefficients of TL to be equal. 
This results in the equation 


by by Ay 


bs+1 7 bs+1As+1 


If we simplify this equation, we find that 11= 45+1.Finally, we have arrived 
at the desired contradiction since all the eigenvalues of A that we are using 
are distinct; no duplications are allowed. Since this holds for arbitrary s < 


pa — oy : 
k we can now conclude that the set {x 15Z2,--+>2 k} is an independent 
subset of C”. 


Theorem 12.5.7. Let 1, 2, ..., Ax € © be the k distinct eigenvalues of A. 
E 


respectively. Then Bı U B2 O ... U Bk is an independent set in C”. 


Also, let B1, B2, ..., Bk be bases for the eigenspaces Ey, ,E,,. 


We leave the proof of Theorem 12.5.7 as an exercise. It is similar in spirit 
to, and should be viewed as an extension of, Theorem 12.5.6. 


If we now combine several of these facts, we will see when a square 
matrix A is or is not diagonalizable. 


Theorem 12.5.8. 4 square matrix A is diagonalizable if and only if for 
each eigenvalue i of A, we have dim (E,) = M 


Proof. First, we prove that if a square matrix A is diagonalizable; then, for 
each eigenvalue à of A, we must have dim (E,) = My So let A be 
diagonalizable with A = obo! for diagonal matrix D with the 
eigenvalues of A on its diagonal and Q an invertible matrix whose 
columns are eigenvectors of A for the corresponding eigenvalues of A in 
D. So, by Theorem 12.5.5, each distinct eigenvalue A, of A must appear on 


the diagonal of D exactly My, times. Then, corresponding to each distinct 


eigenvalue Ax of A, we must have My, independent eigenvectors for the 
eigenvalue A, appearing as columns in Q in the same location as Àk 
appears in D. The independence is necessary since if any subset of the 


697 


columns of Q form a dependent set, then Q has no inverse. So dim (E,) > 
My since the dimension of a vector space is the size of its largest 


independent subset. By Theorem 12.5.1, dim (E,) < M for each 


eigenvalue à of A. Combining these two inequalities, we have dim (E,) = 
Mh for each eigenvalue à of A. 


Next, we prove that if we have dim (E,) = M) for each eigenvalue Aof A, 
then A is diagonalizable. 


In order to prove that A is diagonalizable, we only need to construct the 
diagonal matrix D of eigenvalues of A and the invertible matrix Q of 
corresponding eigenvectors of A. The diagonal matrix D is always 
constructible since we can place each distinct eigenvalue à of A on D’s 
diagonal Mj times, giving a total of n eigenvalues since the sum of all the 
multiplicities of A ’s distinct eigenvalues is n by Theorem 12.5.3. 


As to the construction of Q, since dim (E; = M) for each distinct 


eigenvalue of A, we can find a basis By of each subspace E, consisting of 
Mh elements. Now place in the matrix Q the elements of the basis By in 
the samelocations where you have placed their eigenvalue à in D. This 
completes Qsince Q must have n columns and the sum of all the 
multiplicities of A’s distinct eigenvalues is n. The final piece of the proof 
is that Q must be invertible:This fact follows from Theorem 12.5.7, since 
a square matrix is invertible if andonly if its columns form an independent 
set. 


Theorem 12.5.9. A square matrix A is not diagonalizable if and only if for 
some eigenvalue i of A, we have dim (E,) <M). 


Proof. This part follows from Theorems 12.5.1 and 12.5.8. Note that a 
matrix A is not diagonalizable if and only if the matrix Q is not 
constructible from independent eigenvectors of A to have an inverse oma 


Theorem 12.5.10. 4 square n x n matrix A is diagonalizable if and only if 


there is a basis for ©" consisting entirely of eigenvectors of A. 
& Y sf 


Proof. If A is diagonalizable, then A = opa"!. Q’s columns 


> > > l 
Ti, T2,- - Tn, are n eigenvectors of A and elements of C”. Q has an 


698 


inverse if and only if its columns (and rows) form a basis of ©”. So the 
existence of this inverse would force Q’s columns to be a basis of C”. 


= ESE z? ~ 
Let {zi aa kaai. Tn} be a basis for ©” consisting entirely of 
eigenvectors of A. Then the matrix Q with these n vectors as its columns 
has an inverse matrix OF If we also let D be the diagonal matrix with the 


corresponding eigenvalues of A on its diagonal to these Tk ’s, then A is 
diagonalizable with A = ODO", since AO = QD automatically holds for 
any n x n matrix Q of eigenvectors of A and the corresponding diagonal 
matrix of eigenvalues D. 


The following theorem is a culmination of the previous facts related to A’s 
diagonalizability. It gives the typical situation when A is diagonalizable 
and some examples of general matrices that are not diagonalizable. 


Theorem 12.5.11. Jf a square n x n matrix A has n distinct eigenvalues, 
then A is diagonalizable. 


Proof. If a square n x n matrix A has n distinct eigenvalues, then the fact 
that A is diagonalizable follows from Theorems 12.5.8 and 12.5.1. In this 
case all the eigenvalues of A must have algebraic multiplicity 1 since the 
characteristic polynomial P(A) of A has degree n, so 1 < dim (E VM = 
1, for each eigenvalue A of A, giving dim (Ey =1=M). 


Nondiagonalizability is fairly rare for square n x n matrices A since most 
characteristic polynomials P(A) have n distinct roots, with each root 
having algebraic multiplicity 1. As a consequence of Theorem 12.5.3, if A 
is upper or lower triangular with distinct diagonal entries, then A is 
diagonalizable. 


Example 12.5.1. As some examples of nondiagonalizable square matrices, consider the 
following families of upper triangular matrices 


a b abe 
A= al B=|0 a d 
0 0 
(12.50) : 


where b ¢ 0 in the matrix A, and at least one of b, c, or d is not zero in B. These two types 
of upper triangular matrices have their only eigenvalue a along their diagonals. We leave 


699 


it as an exercise to check that the 3 x 3 matrix B is nondiagonalizable, while we now 
show that A is nondiagonalizable. 


The characteristic polynomial for A is 
P(A) = det(A — Ag) = (a — A)? = (A-a)? 


Now, solving P(A) = 0 tells us that a is the only eigenvalue of A and it has algebraic 
multiplicity Ma = 2. 


Now let us find a basis of the eigenspace E, In order to do this, we must solve the 
linear system of equations given by 


> 
nasi ah)? = 0 


for the eigenvectors T for eigenvalue a. Equation (12.51) really is 


0 b > 
oo|27=9 


which has solutions a =[c0] T These solutions z form the eigenspace E, and give 


us a basis vector (1,0). So dim E» = 1] < Ma =2, and by Theorem 12.5.9, A is not 
diagonalizable. 


You should refer to an advanced linear algebra text to see a complete 
description of nondiagonalizable square matrices, as well as a discussion 
of canonical forms, of which diagonalizability is only one. Canonical 
forms for square matrices are classified according to whether two square 
matrices are similar; the square matrix A and the diagonal matrix D of A’s 
eigenvalues are similar matrices when the matrix Q exists and is invertible 
in order for A to be diagonalizable. 


Our final Theorems (hopefully, you have not fallen asleep yet) concerning 
A go back to its eigenvalues and eigenvectors and then move on to its 
diagonalizability. These theorems will hold when A is a real matrix 
because this property allows certain favorable behavior in the roots of A’s 
characteristic polynomial. 

Theorem 12.5.12. If A is a real matrix, then A’s complex eigenvalues 
occur in complex conjugate pairs à and À. In addition, rz is an 


eigenvector for 4, then Z is an eigenvector for A Moreover, ifAisn xn 


700 


and n is odd, then A has at least one real eigenvalue À and its eigenspace 


E, has a real basis since A is real. 


Proof Let P(A) be A’s characteristic polynomial. Then all of the 
coefficients of the characteristic polynomial P(A) are real since A is real. 
Now the complex conjugation operation has several useful properties 
(given without proof, but you should study these properties as exercises), 
which are that for any two complex numbers c and d, you will find that 


e+d=C+d and €:d=cd. These two properties apply to 
differences as well as sums, divisions, and products, and to more terms or 
factors than just two. 


Let r be a root of the characteristic polynomial P(A) Then P(r) = 0. If we 


take the conjugate of this equation, then we get P(F) = 0 by factoring in 
the conjugation properties mentioned above since the polynomial’s 
coefficients are all real. So any real coefficient polynomial has complex 
roots that occur in complex conjugate pairs. It follows that P(A) has at 
least one real root r if its degree is odd since it must have an even number 
of complex roots while its total number of roots is odd. Then, if A is n x n 
and n is odd, it follows that A has at least one real eigenvalue since 
P(A)has degree n. 


Let z be an eigenvector for A. Then At = EA If we take the complex 
conjugate of this matrix equation, we have 


Az=AzZ =Az 


for the conjugate of the LHS. While on the RHS, AT =À 


; =; : 
combine these two, we get Ẹ as an eigenvector for A. 
Theorem 12.5.13. Jf A is a real symmetric matrix, then all of A’s 
eigenvalues will be real and there exists a real basis for each ofA’s 


eigenspaces E, Also, the eigenspaces of A are perpendicular (or 


=> => 
orthogonal) to each other, meaning that if ©) € Ey, and T2 © Ey, are 
two real eigenvectors for two distinct eigenvalues i and M2 of A, then the 
= + 


vectors ©\ and T2 are perpendicular to each other, and hence ©\ - T2 = 
0. 


701 


Proof. Let à be an eigenvalue of A with eigenvector 7, that is, At = aw, 
where Ed e C” and z # 0. Now we need the complex dot product on C 
n 


2 = 
Recall that if a € C, then lal” = aū. Asa consequence, for v eG”, 
we get 


vP =7-? 
=717 


n 
k=1 
n 
=) Ive! 
k=l 


—> 
which implies that Tp e R*, and T2 = 0 if and only if v= 0 we 
will also make use of the last two dot product properties listed in the table 
at the end of Section 6.2 (immediately preceding the homework problems 


there). Let a8 € E and u, v e C”; then we have 
(av): V¥=a(v-V)=7- (av), 
@-(8V)=B(2-V)=(BU)-7V 


Now let B be any complex n x n matrix. Then we take the complex dot 


product of BY and 2 for any 7, Z eC” get 


702 


(BY). 2 =(BY)' ? 
= (77 BT) #7 
-7 (87) 
= 7" (BT?) 
= 7. (B72) 
We have assumed here that the conjugate of a product is the product of the 


conjugates. This is true for the product of either complex numbers or 
matrices with complex entries. 


Finally, we can prove our theorem. We now look at the complex dot 
product of 4% and © . This first gives us 


(AT): T = (AÈ) 
=A(@- 2) 
=A\2/? 


2 —} 
where Ei # 0, since z? +0. Now, from the previous results presented 
above, we also have that 


(Az). 2 =. (AT?) 


because A is both real and symmetric. If we put these two pieces of our 
puzzle together, we get 


Az? =A z/ 


703 


and so à = A. This implies that À is real, so we can now conclude that all 
of A’s eigenvalues are real. 


Next, we turn our attention to A’s eigenvectors. Remember that for the 
eigenvalue A of A, z is an eigenvector of A for À if z e C” is a solution 


-b 

to the linear system (4 — Aln) 7-0 . Since the matrix A — AJ, is now a 
real matrix, this system of linear equations can be solved to produce a real 
basis for its solution space, so in the remainder of this proof, we can treat 


E; as a subspace of R” instead of C”. 


— — 
We want to show that if 71 © E), and T2 € Ey, are two real 
eigenvectors for two distinct eigenvalues A; and i2 of A, then the vectors 


T1 and T2 are orthogonal to each other. We need to compute the real dot 
> => 

product of A(“1 + T2) with itself in two different ways. First, we obtain 

(12.52) 


A(T} + 7) - A (Ti + 33) 


(A17? + A273) - (A17? + A273) 


2 2 
AF ZII + 2A1A2 F} - 73 + A} IT 


ll 


Second, we have 


(12.53) 


A (ZÌ +%2)- A(Z} + 72) = (TÌ 


If we equate these two calculations and cancel equal terms, we get that 


212 ZB = (AF + A3) Ti- Th 


> => 
Assume that our two eigenvectors are not orthogonal; then TL- T2 + 0 and 
our last equation simplifies to 


212 = AF +-A3 


704 


which is the same as 


(A; — A2)? =0 


This forces 41 = 42, which is not possible since we stated that these two 
eigenvalues are distinct. This is a contradiction, and so our two 


= => 
eigenvectors are orthogonal, or “1 = “240. 


As a consequence of the proof of this theorem, we also have the following 
theorem, which we state without further proof. 


Theorem 12.5.14. Let B be any complex n x n matrix with y, z e T”. 
Then 


(BY). 2 = 7- (BT?) 


for the complex dot product. As a consequence, if the matrix B is 


Hermitian, that is, B = p! , then all of B’s eigenvalues are real, but 
B’s eigenvectors remain complex. 


Our final (we promise) result of this section gives the best possible 
situation for the real symmetric matrices, namely, that they are all 
diagonalizable. We already know from Theorem 12.5.13 that if they are 
diagonalizable, then their diagonal matrix D is real, and so is their 
invertible matrix Q of eigenvectors. Theorem 12.5.15 will now also tell us 
that Q can be made into a very special type of matrix, an orthogonal 
matrix where go! = oT . We will not prove Theorem 12.5.15, although 
Theorem 12.5.13 and Gram-Schmidt orthonormalization explain why Q 
can be made orthogonal. 


Theorem 12.5.15. All real symmetric matrices A are diagonalizable with 
A= QDO, where Q and D are real-valued and the eigenvector matrix Q 
can be made an orthogonal matrix; that is, the columns of Q are mutually 
perpendicular unit vectors. 


A proof of the portion involving D and Q follows directly from Theorem 
12.5.13 and the Gram-Schmidt orthonormalization process, but we give no 
proof here that A must be diagonalizable. 


705 


Note that a square matrix Q is orthogonal if and only if go! = oT , or 
equivalently that oTi Q = In, where Jn is the identity matrix of the correct 
size. If det(Q) = 1 for an orthogonal matrix Q (orthogonal matrices always 
have a determinant that is +1), then Q is called a rotation matrix since 
multiplication by Q will rotate all vectors about the origin (the zero 
vector) through some fixed angle @. In particular, if Q is a 2 x 2 rotation 
matrix, then 


_ | cos(@) —sin(@) 
ie sin(@)  cos(ð) 


for some fixed angle 0 


So, if A is a real symmetric matrix, then A is diagonalizable with A = 
QDO", , where D is a real diagonal matrix of the eigenvalues of A and Q is 
a real rotation matrix of corresponding eigenvectors of A. 


Hopefully, these facts about eigenvalues, eigenvectors, and 
diagonalizability have been helpful in clarifying this material as well as 
giving you topics for further study. We also hope that the few facts that 
have been given in this section without proof will encourage the reader to 
further investigate the myriad intricacies of linear algebra, and beyond. 


Homework Problems 


1. Let A be an X n matrix that is diagonalizable. Let P(A) be A’s 
characteristic polynomial. Show that P(A) is the n x n zero matrix 
where P(A) is the characteristic polynomial with à replaced by A. 
Note that it is not validto replace à by A in the determinant formula 
for P(A). Why not? 


(Comment: This is a version of the Cay ley-Hamilton theorem, 
which states that any square matrix A plugged into its characteristic 
polynomial gives the zero matrix back.) 

2. Let A be an x n matrix with n distinct eigenvalues 11,A2...An. Let 
P(A) be A’s characteristic polynomial. Show that P(A) is the n x n 
zero matrix, where P(A) is the characteristic polynomial with À 
replaced by A. 


706 


3. Let A be a diagonahzable matrix with all nonnegative real 
k 

eigenvalues. Define VA for k any positive integer and check with 

an example that your definition works. 


4. Let A be a diagonahzable n x n matrix where all of A’s 
eigenvalues satisfy |A| < 1. Show that lims—0oo 4" = On where On is 
the zero n x n matrix. 


5. Let A be any diagonahzable n x n matrix with A = opo"! for D 
the diagonal matrix having the eigenvalues 11, A2, ..., An on its 
diagonal. Show that e^ must be invertible and that 


det (eĉ) — edi tat tAn 


6. Let à be any real number and Ķ be any nonzero real number. 
Then we know that the 2 x 2 matrix 


is nondiagonalizable. Show that 


1 K 
E. ONE \ 
duo[t K] 


7. Let à be any real number and K be any nonzero real number. 
Show that the following 3 x 3 matrices are nondiagonalizable. 


\ K 0 40K 40 0 
aloa of (10 AO] (LOA K 
0 0 A oo A 00 À 


8. We know that the 3 x 3 matrices from problem 7 are all 
nondiagonalizable. Find the formulas for e^ for each of them. 

9. Let à be any real number and K, L, and M be any real numbers 
where at least one of K, L, and M is nonzero. Show that the 
following 3 x 3 matrix is nondiagonalizable: 


707 


A=|0 A M 
0 0 å 


10. For the nondiagonalizable matrix Æ in problem 9, find a formula 
for ef. You may have to consider specializing the situation to two 
of the three real numbers K, L, and M being equal and the third 
number being 0 and/or all three of K, L, and M being equal to each 
other. 


11. Use the methods of Section 12.4, along with problem 10 above 
to solve the following homogeneous first-order system of linear 
differential equations, where x(f), y(ĉ)},, and z(4), are three unknown 
functions of ¢ with the initial conditions x(0) =2, y(0) =—1 and z(0) 
=3: 


dz 

dy E 

a 
a = y(t 
dt ait 


12. Let A be a real square matrix. Remember that (oT = (Q! 
and (BC)! = CTBT. 
(a) Show that A is diagonalizable if and only if A! is also 
diagonalizable. 
(b) If A is diagonalizable, how are the eigenvalues and 
eigenvectors of A related to those of A? 


AT A)T 
(c) Is © = (e ) if A is diagonalizable? If yes, prove it; if 
no, give a counterexample. If no, then how can we modify 
things to make them appropriate to the complex matrix case? 
12. Let Q be a real n x n orthogonal matrix, that is, oo" = In. Show 
that the following are all true: 
(a) det(Q) = +1 
(b) oTo = In makes oT orthogonal as well, and O = om 


708 


(c) If S is also a real n x n orthogonal matrix, then the product 
QS is also orthogonal. 


(d) Do all of these facts still hold if we delete the term “real” 
throughout? 


Mathematica Problems 


1. Find, using Mathematical DSolve command, the real solutions to 
the homogeneous first-order system of linear differential equations 
given in homework problem 11. 


2. Use Mathematica wherever possible to verify the results 
discussed in the homework problems. 


Research Projects 


1. Find an advanced linear algebra text and review the proof that the 


dimension of an eigenspace E, is less than or equal to the 
eigenvalue X’s algebraic multiplicity M). 


2. Read the article by M. Carchidi [4] on finding an eigenvector for 
an eigenvalue of algebraic multiplicity 1. Now write some 
Mathematica code to carry out this article’s algorithm and test it out 
on an example to see if it works. 


3. Investigate the topic of canonical forms that categorizes square 
matrices by similarity types. Specifically, look into Jordan and 
rational canonical forms in an advanced linear algebra book. See if 
you can get Mathematica to do some computational work related to 
these types of canonical forms. 


4. Investigate the topic of inner products and inner product spaces. 
Specifically, look into when a square matrix A is orthogonally 
diagonalizable, such as the real symmetric matrices A. 


709 


12.6 The Geometry of the 
Ellipse Using Eigenvalues 
and Eigenvectors 


The purpose of this section is to look more closely at matrix multiplication 
by a real square matrix A and how we can interpret this multiplication 
geometrically. This geometric interpretation will involve conic sections, 
specifically the ellipse. For our analysis, we will also need some 
information from multivariable calculus involving optimizing functions of 
several variables subject to constraints and the method of Lagrange 
multipliers that solves such problems. The method of Lagrange multipliers 
is needed only for Example 12.6.1 since in Example 12.6.2 we use only 
linear algebra. 


In our examples, we will consider specific real square matrices and see 
how multiplication by these matrices has some interesting geometry 
related to eigenvalues and eigenvectors. 


Example 12.6.1. As usual, we begin with an example to illustrate what we have in mind. 
We start with looking at multiplication by the matrix 


= | an) all 
[3 2] 


geometrically by seeing what it does to all the vectors on the unit circle. This particular 
matrix A is chosen so that it is diagonalizable with real D and Q. All of the vectors on the 
unit circle can be written as the columns of the form [ cos(f) sin(t) f. . We are going to 
multiply the vectors of the unit circle by A, and plot the resulting vector endpoints in the 
plane in order to see what parametric curve they form, as illustrated in Figure 12.5. 


A = {{12, —2}, {—7, —1}}; 

UnitCircle = {{Cos[t]}, {Sin[t]}}; 

UnitCirclePlot = ParametricPlot[Flatten[UnitCircle],{t, 0, 27},Plot- 
Style—{Blue, Thickness[0.01]}]; 

(ATimesUnitCircle = A.UnitCircle) // MatrixForm 


( 12 Cos{t] — 2 Sin|t] ) 
7 Cos|t] — Sin|t] 


710 


ATimesUnitCirclePlot = ParametricPlot[Flatten[ATimesUnitCircle], 
{t, 0, 27}, PlotStyle+{Red, Thickness[0.01] }); 
Show[UnitCirclePlot, ATimesUnitCirclePlot, PlotRange— All | 


Figure 12.5: Plot of unit circle via the UnitCircle vector and its 
transformation under multiplication by the matrix A. 


F 


The plot in Figure 12.5 tells us that multiplication by the matrix A has turned the unit 
circle and its vectors of length 1 into an ellipse with vectors of varying length. 
Multiplication by A has also rotated the vectors of the unit circle to vectors on the ellipse; 
otherwise we should expect the ellipse to have its axes parallel to the coordinate axes. 
Note that both the ellipse and the unit circle have the origin as center. 


A.{{1}, {0}} 

{{12}, {—7}} 

Anglel = N[VectorAngle[A[[All, 1]], {1, 0}]] 
0.528074 


Lengthl = Norm[A[[All, 1]]] 


711 


A.{{0}, {1}} 


{{-2}, {-1}} 
Angle2 = N[VectorAngle[A[[All, 2]], {0, 1}]] 


2.03444 
Length2 = Norm[A[[All, 2]]] 


v5 


These two angles tell us that multiplication by A is rotating the points of the unit circle 
but the angle of rotation is not constant since angle] and angle2 are not the same. 
Moreover, the effect of A on the lengths of vectors is not constant since length] and 
length2 are also not the same, although both come from A times a unit vector. 


Let us create an animation of 100 frames that displays, on the unit circle, a short arrow 
for the unit vector X , while on the ellipse, it displays the corresponding vector AÑ asa 


long arrow. This animation will allow us to visualize how the multiplication by a matrix 
A works geometrically: 


Manipulate[ 

ArrowPlots = 
{Graphics[{Arrowheads[.04], Thickness[.005], Red, 
Arrow[{{0, 0}, Flatten[A.UnitCircle] /. t—tval}]}], 
Graphics[{ Arrowheads[.03], Thickness[.004], Blue, 
Arrow[{{0, 0}, Flatten[UnitCircle] /. t -tval}]}]}; 

Show[UnitCirclePlot, ATimesUnitCirclePlot, ArrowPlots, 
PlotRange— All], 


{{tval, 0, "t"}, 0, 2m, 2/32}] 


2 
Figure 12.6: A single frame (t = 3 a) in the animation of 
transformed unit vectors under matrix multiplication by A. 


712 


o 


The frame of the animation, depicted in Figure 12.6, illustrates the 
geometric nature of multiplication by A in the sense of what it 
does to unit vectors. Now we want to see how eigenvalues and 
their eigenvectors come into play in this geometry. What we 
should be most curious about is the ellipse that is created by A 
from the unit circle. Let us find this ellipse’s rectangular equation 
in x and y, as well as plot two eigenvectors for A, one for each real 
eigenvalue obtained by multiplying the unit eigenvector by its 
eigenvalue so that we get each eigenvector to terminate on the 
ellipse: 


EigVals = Eigenvalues|A] 
{13, —2} 


713 


EigVects = Eigenvectors[A] 


{{-2,1}, (1, 7}} 


AEigVecl = N[EigVals[[1]] EigVects[[1]]/Norm[EigVects{[1]]]] 
{—11.6276, 5.81378} 


AEigVec2 = N[EigVals{[2]] EigVects|[2]]/Norm[EBigVects((2]]]] 
{—0.282843, —1.9799} 


AEigVecPlots = Graphics[{{ Arrowheads[.03], Thickness[.005], Black, 
Arrow[{{0, 0}, AEigVec1}], Arrow[{{0, 0}, AEigVec2}}}]; 


Show[UnitCirclePlot, ATimesUnitCirclePlot, AEigVecPlots, Plot- 
Range All, AxesOrigin—{—14, —8}] 


Figure 12.7: Plot of the unit circle, and its image under the map A. 
The two vectors are in the direction of A’s eigenvectors, with each 
having magnitude the corresponding eigenvalues. 


-4 L r i 
-10 0 10 


at? 


On inspection of Figure 12.7, notice that the two eigenvectors for A are nearly the major 
and minor axes of the transformed ellipse. Why are they so close, but not exactly there? 
There are two possible explanations: (1) the axes of an ellipse are perpendicular but these 
two eigenvectors are not, and so they could not provide the axes; or (2) perhaps in the 
case when the two eigenvectors are perpendicular, they do follow the axes. 


714 


Let us continue this problem by finding the rectangular equation of our ellipse and 
locating its major and minor axes. 


SinCos = iin) E {Sin[t],Cos([t] }] 
{ {Sintt] + 5 = =(-7x—12y), Cos[t] > —(x- ay) }} 
G = Expand [Simplify | (Sin[t]?4+ Cos[t]? — 1) /. SinCos]| 


{-1 25x? 4ixy He 


338 169 169 


PlotG = ContourPlot[Evaluate[G == 0], {x, —13, 13}, {y, —13, 13}, 
ContourStyle—+{Red, Thickness[0.01]}, PlotRange-{{—13, 13}, 


{—10, 10}}, Axes True, Frame-+False, AspectRatio— 10/13] 


Figure 12.8: Plot of ellipse g(x,y) algebraically solved for above. 


-10° 


The ellipse in Figure 12.8 is clearly our previous ellipse with center at the origin. In order 
to find the major and minor axes for this ellipse, we need to find the points on the ellipse 
closest to and farthest from the origin; these points are the endpoints of the major and 
minor axes. 

We need to apply the method of Lagrange multipliers to solve this problem of 
maximizing and minimizing the objective function f(x, y) = e+ y subject to the 
constraint that the points (x,y) must be on the ellipse; that is, they have to satisfy the 
equation 


715 


2, 41, 37 2 
338° 169°" 169” 


or the equation g(x,y) = 0 for 


(cy) = PE Ah ey 4 Shy 
NEY) = 3587 T169 YT 169" 


The objective function f(x, y) is the square of the distance from a point (x, y) to the origin, 
and maximizing or minimizing it is the same as doing it for the distance itself while 
avoiding the distraction of square roots. 


The method of Lagrange multipliers from calculus says the following. A solution (x, y) to 
our problem of maximizing (or minimizing) the objective function fix, y) subject to the 
constraint g(x,y) = 0 must satisfy 


r 
3g 99| _,|9f Əf 
, r ry La 
ðr Oy Ox Oy 
for some constant À called a Lagrange multiplier. In our case, the equation involving the 
multiplier à becomes 


50 41 41 74 


3387 + 169” Teo" t T69” = A (22, 2y] 


or when we divide by 2, we get 


25 20. a 20.5 37 


3387 + T69” T697 t 169” — >! wl 


You should not be surprised that we can write this last equation as the matrix equation 


338 338 tilaa F 
á 37 y| “ly 
(12.54) 
This matrix equation says that our solution X = [x yl is an eigenvector for the matrix 
25 41 
338 338 


a 37 
338 169 


B= 


716 


for an eigenvalue à of B. The two components x and y of the eigenvector X must also 
satisfy g(x,y) = 0 in order to be a solution of our optimization problem. 


Since our eigenvectors X of B will give the major and minor axes of our ellipse if they are 
chosen to satisfy the ellipse g(x,y) = 0, we need to find a scalar a so that aX satisfies 
g(x,y) = 0. This works for this problem because any multiple of an eigenvector is also an 
eigenvector for the same eigenvalue. 


B = {{25/338, 41/338}, {41/338, 37/169}}; 
{BEigVec1, BEigVec2} = N[Eigenvectors[B]] 
{{0.567376, 1.}, {—1.7625, 1.}} 


al = Solve[(G == 0) /.{x—+aBEigVec1[[1]], y+aBEigVec1 [[2}]}, 
o) [1] 


{a — —1.62138} 


a2 = Solve[(G == 0) /.{x-aBEigVec2([1]], y+aBEigVec2{[2]]}, 
a][[1]] 


{a —+ —6.88267} 
BEig Vec1.BEig Vec2 
—2.22045x 107 16 


BEigVecPlots = Graphics[{Arrowheads|[.03], Thickness[.005], Black. 
Arrow[{{0, 0}, aBEigVecl /. a1}], Arrow[{{0, 0}, aBEigVec2 /. 
a2}]}]; 


Show[PlotG, BEigVecPlots, PlotRange—{{—13, 13}, {—10, 10}}, 
AspectRatio—+10/13] 


Figure 12.9: Plot of ellipse and Largrange eigenvector solutions. 


717 


-10 ! 


So we see from Figure 12.9 that the two eigenvectors of B do in fact determine the major 
and minor axes of this ellipse. These two eigenvectors are also perpendicular, as was 
confirmed above when we saw that their dot product is effectively zero. The eigenvector 
BEigvec! is the direction of the minor axis, while BEigvec2 is the direction of the major 
axis. 


Now let us compute the lengths of these two new eigenvectors, a1 BEigvecl and a2 
BEigvec2, which will allow us to write the equation of this ellipse in the standard form 


r? y? 


— — =a] 


(12.55) @ 0 


where we let a be the length of the semimajor axis and b is the length of the semiminor 
axis. There are always two possible ways of writing the equation of a standard form 
ellipse, either that of equation (12.55), or as 


(12.56) 


Notice the only difference is the division by the constants a and b. In this manner, we will 
always assume that a is the length of the semimajor axis and is the length of the 
semiminor axis. It is up to you to pick which one of these you want since you can rotate 
any ellipse into either form. 


Now we will plot the two ellipses to see if we correctly assumed that one is a rotation 
about the origin (their mutual center) of the other. This is depicted in Figure 12.10. 


718 


{a, b} = {Norm[aBEigVec2 /. a2], Norm[aBEigVec1 /. a1}} 
{13.9472, 1.86417} 


x2 ë y? 
PlotStandardForm = ContourPlot (5 +> == 1}, {x, —15, 15}, 


{y, —15, 15}, ContourStyle+{Blue, Thickness[0.01]}, PlotRange— 
{{—15, 15}, {—10, 10}}, Axes-+True, Frame— False, AspectRatio—> 


2 /3| ; 
Show[PlotStandardForm, PlotG] 


Figure 12.10: Plot of transformed ellipse and the ellipse with same 
major and minor axes corresponding to the x- and y-axes instead. 
y 
10 ; 


-10 


As the very last part of this example, we compute the rotation angle needed to rotate the 
original ellipse counterclockwise in order to get the standard-form ellipse. The 
eigenvector —BEigvec2 is rotated to the x-axis, which allows us to compute this angle, 
which seems to be under 45°. The calculation below tells us that this angle is close to 30°: 


VectorAngle|—BEigVec2, {1, 0}] / Degree 
29.5696 


Next, we will look at the situation of Example 12.6.1 in full generality. 
What this means is that we will theoretically study the nonstandard form 
ellipse generated by all the vectors (cos(f),sin(f)) of the unit circle 
multiplied by a 2 x 2 real matrix 


719 


a ff 
A= | a 
Our goal is to know when this result is truly a non-standard-form ellipse 
or something else, and when it is a nonstandard form ellipse we wish to 
find two vectors that go from its center at the origin to the ends of its 
major and minor axes. This will also allow us to find a standard form 
equation of the ellipse that you get by rotating the original ellipse so that 


its axes become the coordinate axes, and of course we want to find this 
rotation angle 0. 


If we multiply A times all of the vectors of the unit circle given 
parametrically as (cos(¢),sin(f)) for t € [0,27], and label the result as X = [x 


y] 2 , then we have 


a cos(t) + 8 sin(t) 


as y cos(t) + 6 sin(t) 


(12.57) 


In terms of x- and y-coordinates, this yields 


(12.58) Tt =a cos(t) + 8 sin(t), y = y cos(t) + 6 sin(t) 
If det(A) = 0, then the two rows of A form a dependent set of vectors and 
so one of them must be a scalar multiple of the other, say, (a, p} = ky,6) 
for some scalar k. This tells us that the two equations (12.58) become the 
single equation x =k y, which is the equation of a line through the origin. 
So, when det(A) = 0, we do not get an ellipse, but the degenerate case of a 
line through the origin. Clearly, this case is not very interesting, and so we 
move on to the case of det(A) # 0. 


If det(A) + 0, then we want to solve the equation (12.57) for cos(t) and 
sin(f) in terms of x and y, as we did in Example 12.6.1. Since A is 
invertible, we get [cos(¢) sin(#)|" = ALY, which can be written out in full 
as 


cos(t) | = 1 vom 
sin(t) | det(A) | -yr+ay 


720 


So 


(62 — By), sin(t) = = (-ya+ay) 


a ae det(A) 


1 
det(A) 
If we now square these two equations and add them together, we have 


(det(A))* = (62 — By)? + (-yx+ay)’ 


Finally, simplifying this equation, we obtain 
(12.59) 
(5? + 7) 2? — 2(85 + a7)ry + (a? + 8?) y? = (det(A))? 


Back in Section 4.1, we discussed the fact that all conic sections (ellipses, 
hyperbolas, and parabolas) have equations of the form given by equation 
(4.1), which we reproduce here: 


A y oe ee ee E 
a260) 27 +bry+cy +dz+ey+f=0 


This equation applies for real constants a through f, where not all of a, b, 
and c can be zero, since if they were, you would have only a line, which is 
a degenerate conic section. This quadratic equation in x and y is an ellipse 
exactly when its discriminant b^ — 4ac satisfies b? — 4ac = 0. It is a 
hyperbola if b? — 4ac > 0, and a parabola if b? — 4ac = 0. The degenerate 
conic sections of points and lines are also possible. If both the constants d 
and e are zero, then the center of this conic section is the origin. 


Now let us compute the discriminant of our conic section, given in 
equation (12.59), and see if our equation is an ellipse: 
b? — dac = 4(8 5 + ay)? — 4 (8 +7”) (a? +8?) 
= 4 (876? + 2a876 + a? 4?) — 4 (a? 8? + a? + 876? + 8-47) 
= 4(2a$76) — 4 (a?6? + B) 
= —4det(A)? < 0 


So equation (12.59) is that of an ellipse with center at the origin since this 
equation has no terms involving a multiple of either x or y. If we divide 


721 


this equation by (det(A))* so that the RHS is 1, the equation now has the 
form 


ax? + 2bry + cy? = 1 


(12.61) 
where 
67 +7? Bå+ay) a? +8? 
= haa SF gy o> 2 
(det(A)) (det(A)) (det(A)) 


We want to rotate this ellipse about the origin so that it is in standard form 
with its major and minor axes as the coordinate axes. 


Equation (12.61), corresponding to our ellipse, can be written in the 
following matrix form: 


ened 


(12.62) 
On setting 


we can express equation (12.62) simply as 


XTBX =1 
(12.63) 


Note that in order to achieve this new form for our equation that we are 
using, we really did require the 2b coefficient in front of the xy term in 
equation (12.61), instead of just b as the coefficient. This matrix B is a real 
symmetric matrix, and so it has two distinct real eigenvalues, 41 and 12, 


. g . . . = 4 s} . = 4 
with corresponding real perpendicular unit eigenvectors TL and T2 if T1 


and T2 are not both unit vectors, then divide each by its length to make 
them unit vectors. The matrix B is diagonalizable with B = opo"! where 


722 


and Q is the matrix whose columns are the eigenvectors Ti and T3, Also, 
Oo T oT since Q is an orthogonal matrix with perpendicular unit column 
vectors. This tells us that the equation B = opo! can be written as BO = 
QD or D = oF BQ. Also note that neither of these two eigenvalues can be 
zero since 


det(B) = ac — b? = -i (4b? — 4ac) > 0 


where 4b? — 4ac < 0 is the discriminant of the ellipse given in equation 
(12.61), and det(B) =A1 A2 


Note that the matrix Q is orthogonal, therefore there is some angle 0 so 
that 


_ | cos(@) —sin(@) 
Q= sin(@)  cos(ð) 


and multiplication by Q is a rotation of all vectors about the origin through 
the angle 6, where the vector (1,0) is sent to the vector (cos(@),sin(@)) on 
the unit circle. 


Let u and v be two new perpendicular coordinate axes with the same 
origin as the x and y axes (the center of our ellipse), with the 
u,v-coordinate axes corresponding to the major and minor axes of our 
ellipse. Our ellipse should be in standard form in the u, v-coordinate 
system 


re o E 
(12.64) E” L? 


where K and L respectively are the lengths of the semimajor and 
semiminor axes of our ellipse. The claim is that 


723 


T u 
X = = 
| y | j i | 
will work to place our ellipse in standard form. Let us replace X by Q [u 
v]? in equation (12.63) and see if we get an equation of the standard form 


given in equation (12.64). For convenience, we will set U = [u v] T and 
thus X = QU. Carrying out the desired replacement gives 


1 = X7BX 
= (QU)" BQU 
= UTQT BQU 
= yT (QTBQ) U 
= UT DU 


A, 0 
—yr|”™ 7 
l l 0 a 


= yu? + Agv? 


l I 
So V'A1! and V1A2! are the lengths of the semimajor and semiminor 
axes of our ellipse depending on which one is the larger. This also tells us 


that the unit perpendicular eigenvectors “1 and T2, which make up Q, 
satisfy 


— 
Ti = Qet, T = Q 


ne - 
for ee = (1,0), and z = (0,1). So TEN and Jai are vectors that go 


from the origin to the ends of the semimajor and semiminor axes of our 
original ellipse. 


The rotation angle 0 can now be found by the angle between the vectors 
uy and 7}, since xr = Qui . If this angle is not between 0° and 90°, then 
use zt instead of Tt. This is necessary since the direction of T might 


need to be reversed in order to get the correct angle of rotation as an angle 
between 0° and 90°. 


724 


In the next example that follows, we will carry out the procedure outlined 
above to put into standard form the equation of the ellipse 5x? — 8xyt 177 
= 1, which has center at the origin. In the exercises you will have to deal 
with the situation where this ellipse has center off the origin, as well as 
placing in standard form the other types of conic sections: the hyperbola 
and the parabola. 


Example 12.6.2. The following equation 


5x” — 8ry + 17y? =1 
(12.65) 


is an ellipse since its discriminant is b? —4ac = - 276 <0 (see Fig. 12.11). We wish to 
place this ellipse in standard form, finding a standard-form equation for it, determine the 
rotation angle it takes to get it into standard form, and, of course, plot both ellipses to 
verify our results. 


In order to do this we need to find the eigenvalues and eigenvectors of the matrix 


5 —4 


B=| 4 17 


Recall that the equation of our ellipse is given by X™BX= 1, where X= [x y] r Also, the 


1 1 
lengths of the semimajor and semiminor axes are = and - for the 
Vial  yiàzl 


eigenvalues 1 and Az of B where their corresponding eigenvectors T? and T are the 


direction vectors for the major and minor axes of our non-standard-form ellipse. These 
two eigenvectors are not necessarily unit vectors, and so to get vectors that follow the 


P si 7 rs 2 
semimajor and semiminor axes, we need to take the unit eigenvectors Eil and Ei 
Ti T 
and then change their lengths to those of the semimajor and semiminor axes. So the 


vectors 


+ > 
1 Ti 1 T2 


Viral AT VAa] El 


go from the center, at the origin, to the ends of the semimajor and semiminor axes of our 
non-standard-form ellipse. 


PlotOrigEllipse = ContourPlot[{5 x? —8 xy + 17y? == 1}, {x, 
—.6, .6}, {y, —.6, .6}, ContourStyle-+{Red, Thickness[0.01]}, Plot- 
Range-+{{—.6, .6}, {—.4, .4}}, Axes— True, Frame-+False, Aspect- 
Ratio—2/3} 


725 


Figure 12.11: Plot of the ellipse given by equation (12.65). 
y 


PlotOrigEllipse = ContourPlot[{5 x? — 8 x y + 17 y? == 1}, {x, 
—.6, .6}, {y, —.6, .6}, ContourStyle+{Red, Thickness[0.01]}, Plot- 
Range-+{{—.6, .6}, {—.4, .4}}, Axes—+True, Frame-+False, Aspect- 
Ratio—2/3} 


BEigVals = N(Eigenvalues[B}] 
{18.2111, 3.7889} 


(Q = Transpose[ N[Eigenvectors[B]]]) // MatrixForm 


( —0.302776 3.30278 ) 
l. 1; 


Q[[An, 1]}.Q{[All, 2]} 
—4.44089 x 10716 


Note that Q is not an orthogonal matrix since its columns are not unit vectors, although 
they are perpendicular. We can make Q orthogonal by dividing each column of Q by its 
corresponding length. We do this next, as it will be needed for future calculations. 


726 


L1 = Norm[Q[[A1, 1]]] 
1.04483 
L2 = Norm[Q/[All, 2]]] 


3.45084 


(QN = Transpose[{Q/[All, 1]]/L1, Q[[AN, 2]]/L2}]) // MatrixForm 
( —0.289784 0.957092 ) 
0.957092 0.289784 


Chop[Transpose[QN].QN] // MatrixForm 


(oL) 


Since oTo = h, it follows that Q is an orthogonal matrix with unit perpendicular 
columns. If we had used exact values for the entries of each eigenvector, we would have a 
matrix that satisfies a" QO = h, but for our example, Q is close enough to orthogonal. 


{K, L} = 1/Sqrt[BEigVals ] 


{0.234332, 0.51374} 


So the length of the semimajor axis is L = 0.5137402285 while the length of the 
semiminor axis is K = 0.2343321515. We can take as the standard-form equation 


r? 2 


y 
TTK 


The standard-form ellipse and the original are depicted in Figure 12.12. 


=1 


; x2 y 

PlotNewEllipse = ContourPlot| {= + K == 1}, {x, —.6, .6}, {y, 
—.6, .6}, ContourStyle—{Blue, Thickness{0.01]}, PlotRange-+{{—.6, 
-6}, {—.4, .4}}, Axes— True, Frame— False, AspectRatio—=2/3]; 


dad 


Show[PlotOrigEllipse, PlotNewEllipse] 


Figure 12.12: Plot of the ellipse given by equation (12.65) and the 
ellipse with same major and minor axes corresponding to the x- 
and y-axes instead. 


BEigVecPlots = Graphics{{ Arrowheads[.03], Thickness[.005], Black, 
Arrow({{0, 0}, K QN[[A1, 1]]}], Arrow[{{0, 0}, L QN[[AM, 2]]}]}]; 


Show[PlotOrigEllipse, BEigVecPlots] 


Figure 12.13: Plot of the ellipse given by equation (12.65) along 
with the transformed major and minor axes. 


728 


From what we see in Figure 12.13, the semimajor and semiminor axes of the original 
ellipse are the vectors 


T i 
Lz; = ——, KR = 


vial 


h 
where T] and T2 are the eigenvectors of B and the columns of Q. Remember that these 
formulas require Q to be made orthogonal. If Q is not orthogonal, simply transform the 
columns of Q into unit vectors. 


As to the value of the rotation angle 6, it is the angle between the vectors T2 and the 
unit vector along the x-axis, (1,0). From the calculation below, 0 = 16.845°: 


VectorAngle[QN|[AIl, 2]], {1, 0}] /Degree 


16.845 


What we have done in this section for the ellipse can be extended to the 
other two types of conic sections, the hyperbola and the parabola, using 
eigenvalues and eigenvectors to find their standard-form equations where 
their axes are the coordinate axes if their centers are the origin. 


729 


In addition, this material can be moved into three dimensions for the 
quadric surfaces of the ellipsoid, hyperboloid, and paraboloid. These 
quadric surfaces have equations of the general form 


(12.66) 


az?’ + bry + 2crz + dy? + 2eyz+ fz +gr+hy+iz=j 


for real constants a through j. These surfaces will have centers at the 
origin as long as all of the coefficients g, h, and i are zero. In the exercises, 
you will be asked to apply the ideas developed here for the ellipse to other 
conic sections as well as quadric surfaces. 


The fact that we can do all of this for the conic sections and quadric 
surfaces using eigenvalues and eigenvectors is called the principal axis 
theorem. This theorem can also be extended to the higher dimensions 
beyond three for quadratic polynomial equations in n variables. 


Now let us state the general principal axis theorem for n real variables 
X1,X2,...,Xn, Which tells us how general quadratic polynomial equations in 
these n variables can be rewritten without any crossterms; this is 
sometimes called diagonalizing the equation. A general quadratic 
polynomial equation in the n variables given has the form 


n 


tý Ji jTiTj + S his: +k=0 


ij=) i=] 


(12.67) 


for real constants g; j, hi, and k, where gij = gj,i for all i 4 j. If we define G 
to be the matrix consisting of gij terms [i.e., G = (gij)]then clearly G is a 
symmetric matrix by construction. Furthermore, we need this matrix G to 
be real symmetric so that it is rotationally diagonalizable, so we shall 
assume that all g; j are real. A translation of our n variables can be made to 
eliminate the terms > pee hix; from this equation, and for simplicity we 


assume that this has been done for the quadratic polynomial equation that 
we use in the following theorem. 


Theorem 12.6.1 (General Principal Axis Theorem). Let x1,x2,...,xn be n 
real variables. We are given the quadratic polynomial equation 


730 


n 
x gi jTiTj + k= 0 
(12.68) "=? 


for real constants gij and k, where the matrix G = (gij) is real and 
symmetric. Let 1 42,..., An be the n real eigenvalues of G and Q be the 
real rotation matrix that diagonalizes G; that is, G = obo’, where D is 
the diagonal matrix of the eigenvalues ^1 M, ..., An, and Q is the 
orthogonal matrix with determinant I of the corresponding eigenvectors 
of G. Then in the new n real variables u1,u2, ..., un defined by U = OY 
(or alternatively X = QU), where U and X are the column vectors whose 
components are the uk and xk variables respectively, the original 
quadratic equation (12.68) becomes the new quadratic equation: 


X Aju +k=0 
j=l 
(12.69) 


Proof. The proof is closely analogous to what we did in the discussion 
following Example 12.6.1, if you realize that our quadratic equation 
(12.68) can be written as X ‘Gye —k, for the real symmetric matrix G and 
variable column X given above. 


Homework Problems 


1. Explain why the general principal axis theorem is correct. 


Mathematica Problems 
o 1 =o 
1. Let A = | 9 1 | 


(a) Find and plot the equation of the ellipse obtained by 
multiplying all of the points of the unit circle by A. 


(b) Find a standard-form equation for this ellipse, and plot it 
with the original ellipse. 


731 


(c) Compute the rotation angle 0 for rotating the original ellipse 

into the standard-form ellipse. 

(d) Plot the eigenvectors as arrows that extend from the origin 

to the ends of the major and minor axes for the original ellipse. 
2.LetA = l 3 i | 

í 9 

(a) Find and plot the equation of the ellipse obtained by 

multiplying all of the points of the unit circle by A. 

(b) Find a standard-form equation for this ellipse, and plot it 

with the original ellipse. 

(c) Compute the rotation angle 0 for rotating the original ellipse 

into the standard-form ellipse. 

(d) Plot the eigenvectors as arrows that extend from the origin 

to the ends of the major and minor axes for the original ellipse. 
3. Let 7x? —10xy + 12? = ] be the non-standard-form equation of 
our originalellipse. 

(a) Find a standard form equation for this ellipse, and plot it 

with the original ellipse. 

(b) Compute the rotation angle 8 for rotating the original ellipse 

into the standard-form ellipse. 

(c) Plot the eigenvectors as arrows that extend from the origin 

to the ends of the major and minor axes for the original ellipse. 
4. Let 9x? + 8xy + 3y? = ] be the non-standard-form equation of our 
original ellipse. 

(a) Find a standard-form equation for this ellipse, and plot it 

with the original ellipse. 

(b) Compute the rotation angle 8 for rotating the original ellipse 

into the standard-form ellipse. 

(c) Plot the eigenvectors as arrows that extend from the origin 

to the ends of the major and minor axes for the original ellipse. 
5. Let 9x7 + 4xy + 3y? + 58x — 28+ 256 = 0 be the 
non-standard-form equation of our original ellipse. 

(a) Find values of K and L so that x =u + Kandy =v+ Lisa 

translation of this ellipse’s equation into u, v-coordinates in 

which the equation of this ellipse has the form Qu? + 4uv + 3v? 

+ M =0 for some constant M. 


132 


(b) Now find a standard-form equation for this ellipse, and plot 
it with the original ellipse. 
(c) Compute the rotation angle @ for rotating (and translating) 
the original ellipse into the standard-form ellipse. 
(d) Plot the eigenvectors as arrows that extend from its center 
to the ends of the major and minor axes for the original ellipse. 
6. Let 9x? +16xy + 3y? = | be the non-standard-form equation of our 
original hyperbola. 
(a) Find a standard-form equation for this hyperbola, and plot it 
with the original hyperbola. The two possible standard-form 
equations of a hyperbola are given here: 
r? y? y? r? 


a T -3 =l 


a a S a A 


(b) Compute the rotation angle @ for rotating the original 
hyperbola into the standard-form hyperbola. 


(c) Plot the eigenvectors as arrows with the original hyperbola 
to confirm that they are parallel to the hyperbola’s axes. 


Research Projects 


1. Find a discriminant test similar to that for conic sections that 
allows you to decide when a quadratic polynomial equation in the 
variables x, y, and z is an ellipsoid, that is, when an equation of the 
form 


ax? + by? + cz? + 2dry + 2exrz + 2fyz + gr +hy+iz+k=0 


is an ellipsoid. Now use this discriminant test and also plot the 
equation to determine whether the equation 


3T? + 4y? +52? + 2ry + 4rz + Gyz = 
is an ellipsoid. 


Next, find a standard-form equation 


133 


g2 yp z2 


ap p 


for this ellipsoid using eigenvalues and eigenvectors, ten plot this 
standard-form ellipsoid with the original ellipsoid. Also, plot this 
original ellipsoid with three eigenvectors as arrows moving from its 
center at the origin to the ends of its three axes. 

2. (Continuation of project 1.) For the 3 x 3 matrix 


—2 9 -l 
A= | —4 l -r 
5 -3 4 


find the equation of the ellipsoid you get if you multiply all the 
points of the unit sphere by A. Plot this ellipsoid as a parametric 
surface in space. Now find its standard-form equation. 

3. Investigate the topic of quadratic forms, which is related to inner 
products as well as conic sections and quadric surfaces. 


12.7 A Mathematica 
Eigen—Function 


We end this chapter with a Mathematica procedure that you may find 
useful in your attempts to work, and solve, problems involving 
eigenvalues and eigenvectors. The following procedure, called Eigen, 
which takes a square matrix m whose entries are either real or complex 
numbers and an integer j = 1, 2 or 3. If m is nondiagonalizable, then 
Eigen[m, j] will return the string “non-diagonalizable” as output for each 
of the 3 values of j. If m is diagonalizable, then Eigen[m, 1] will return the 
string “diagonalizable“ as output. 


If m is diagonalizable, then Eigen[m, 2] will return the diagonal matrix 
diagm of m’s eigenvalues as output. Finally, if m is diagonalizable, then 
Eigen[m, 3] will return the matrix Qm of m’s corresponding eigenvectors 
as output so that we have 


734 


m = Qm.diagm.Qm™! 
or written in terms of the outputs of Eigen that 
m = Eigen|m, 3].Eigen|m, 2].( Bigen[m, (3])~' 


The Eigen function tests whether m is a real symmetric matrix, in which 
case it returns both Eigen[m, 2] and Eigen[m, 3] as real matrices. Also, 
the Eigen function tests whether m is a Hermitian matrix, in which case it 
returns Eigen [m, 2] as a real matrix: 


Eigen[m_, j-] := ({vals, vecs} = N[Eigensystem[m]]; 
diagm = DiagonalMatrix([vals]; 

Qm = Transpose[vecs}; 

detQm = Chop[Det[Qm]]; 

If[Transpose[m] == m && Conjugate[m] == m, 
(diagm = Re[diagm]) && (Qm = Re[Qm]), Null]; 
If[ConjugateTranspose[m] == emphm, diagm = Rejdiagm], Null]; 
If[detQm == 0, Return["non-diagonalizable"}, 
If[detQm # 0 && j == 1, Return["diagonalizable"], 
IffdetQm # 0 && j == 2, diagm, 

IffdetQm # 0 && j == 3 , Qm, Null]]]]) 


(m = {{-—1, 5}, {5, 3}}) // MatrixForm 
-1 5 

(3 3) 

Eigen([m, 1] 

diagonalizable 


Eigen[m, 2] // MatrixForm 


6.38516 0. 
0. —4.38516 


135 


Eigen[{m, 3] // MatrixForm 


( 0.677033 —1.47703 ) 
1. 1. 


Eigen|m, 3].Eigen[m, 2].Inverse{Eigen{m, 3]] // MatrixForm 
-l. 5. 
5. 3. 
Qm.diagm.Inverse[Qm] // MatrixForm 
=l, 6; 
5. 3. 
Now we switch to a complex 3 x 3 matrix m2 which is Hermitian, 


meaning that its conjugate transpose is itself. In this case, the eigenvalues 
of m2 are also all real, however its eigenvectors are complex. 


(m2 = {{5, 2 I, —I}, {—2 I, 3,3 + 4 I}, {I, 3 — 41, 1}})//MatrixForm 


Eigen{m2, 1] 
diagonalizable 


Eigen[m2, 2] // MatrixForm 


7.7597 0. 0. 
0. 4.74434 0. 
0. 0. —3.50404 


736 


Eigen[m2, 3] // MatrixForm 


0.687341 + 1.208361 0.622359 + 0.230016i1 —0.536188 — 0.662946 i 


—0.875719 + 0.135769i 1.79939 — 0.957197 —0.155913 + 0.243693 i 
1. 1 i. 


Eigen[m2, 3].Eigen{[m2, 2].Inverse[Eigen[m2, 3]] // MatrixForm 


~6.66134 x 10716 — 2.4 3. + 5.551112 x 10716 3.+4.i 


5. +1.249 x 10~}6i -3.33067 x 107-1 +2.i 6.17852 x 10-16 — 1.i 
1.88738 x 10715 + 1.i 3.—4.i 1. — 5.57792 x 10717 i 


Chop[%] // MatrixForm 


5. 2.4 =i 
—2.i 3. 3.44.1 
li 3.-4.1 1 & 


As a third and final example of using Eigen, let’s see that it correctly 
returns “non-diagonalizable” when the matrix is nondiagonalizable. 


(m3 = {{—5, 1, 7}, {0, 8, 3}, {0, 0, —5}})// MatrixForm 


-5 1 7 
0 8 3 
0 0 —5 


Eigen{m3, 1] 
non-diagonalizable 
Eigen[m3, 2] 
non-diagonalizable 
Eigen[m3, 3] 


non-diagonalizable 


737 


Bibliographic Material 


Cited References 


[1] Weisstein, E. “Completing the Square.” From Math World—A 
Wolfram Web Resource. http://mathworld.wolfram.com/ 
CompletingtheSquare.html 


[2] Barnsley, M. (1993), Fractals Everywhere, San Francisco, CA: 
Morgan Kaufmann Publishers. 


[3] Anton, H., and Rorres, C. (2005), Elementary Linear Algebra with 
Applications, 9th ed., New York, NY: John Wiley & Sons. 


[4] Carchidi, M. (1986), “A Method for Finding the Eigenvectors of an n 
x n Matrix Corresponding to Eigenvalues of Multiplicity 1,” The 
American Mathematical Monthly 93(8), 647-649. 


Suggested Reading 


Apostol, T. (1997), Linear Algebra: A First Course, with Applications to 
Differential Equations, New York, NY: John Wiley & Sons. 


Bauldry, W., Evans, B., and Johnson, J. (1995), Linear Algebra with 
Maple, New York, NY: John Wiley & Sons. 


Cheung, C. K., Keough, G., Gross, R. and Landraitis, C. (2009), Getting 
Started with Mathematica, 3rd ed., New York, NY: John Wiley & Sons. 


Hardy, K. (2005), Linear Algebra for Engineers and Scientists Using 
Matlab, Boston, MA: Pearson Education. 


Herman, E., and Pepe M. (2005), Visual Linear Algebra, Boston, MA: 
Pearson Education. 


Höft H. F. W., and Höft, M. (2002), Computing with Mathematica, 2nd 
ed., San Diego, CA: Academic Press. 


Lawson, T. (1996), Linear Algebra, Mat Labs, New York, NY: John 
Wiley & Sons. 


738 


Lax, P. (2007), Linear Algebra and Its Applications, 2nd ed., New York: 
John Wiley & Sons. 


Leon, S. (2002), Linear Algebra with Applications, 6th ed., Upper Saddle 
River, NJ: Prentice-Hall. 


Meade, D., May M., Cheung, C. K., and Keough G. (2009), Getting 
Started with Maple, 3rd ed., New York, NY: John Wiley & Sons. 


Olver, P., and Shakiban C. (2006), Applied Linear Algebra, Upper Saddle 
River, NJ: Prentice-Hall. 


Penney, R. (2008), Linear Algebra: Ideas and Applications, 3rd ed., New 
York, NY: John Wiley & Sons. 


Sauer, T. (2006), Numerical Analysis, Boston, MA: Pearson Education. 


Sewell, G. (2005), Computational Methods of Linear Algebra, 2nd ed., 
New York, NY: John Wiley & Sons, Inc. 


Shishowski, K., and Frinkle, K. (2010), Principles of Linear Algebra with 
Maple, New York, NY: John Wiley & Sons. 


Szabo, F. (2000), Linear Algebra with Mathematica: An Introduction 
Using Mathematica, San Diego, CA: Academic Press. 


Szabo, F. (2002), Linear Algebra: An Introduction Using Maple, 
Burlington, MA: Harcourt/Academic Press. 


Wackerly, D., Mendenhall II, W., and Scheaffer, R. (1996), 
Mathematical Statistics with Applications, 5th ed., Belmont, CA: Duxbury 
Press 


Weiner, J., and Wilkens, G. (2005), “Quaternions and Rotations in E4» 
The American Mathematical Monthly 112(1), 69-76. 


Williams, G. (1996), Linear Algebra with Applications, 3rd ed., Dubuque, 
IA: William C. Brown Publishers. 


Yuster, T. (1984), “The Reduced Row Echelon Form of a Matrix is 
Unique: A Simple Proof,” Mathematics Magazine 61(2), 93—94. 


739 


Indexes 


Keyword Index 
adjoint formula 

affine map 
antisymmetric matrix 
arclength 

array 

astroid 

backward substitution 


basis 
orthogonal 
orthonormal 
standard 


bijective 

binormal vector field 
canonical forms 
Cauchy—Schwarz inequality 
Cayley—Hamilton theorem 
characteristic polynomial 
circle 

cofactor 

column matrix 

column rank 

column space 


commutative diagram 


740 


complex dot product 
complex numbers 
composition 


conic section 
discriminant 


constraint 

continuous function 
Cramer’s rule 

cross product 

cross product properties 
curvature 

degrees of freedom 
dependent set 


determinant 
column expansion 
definition 
row expansion 


determined system 
diagonalizable 


differential equation 
first-order linear 
general linear homogeneous solution 
homogeneous 
nonhomogeneous 


dimension 
discriminant 


divergence theorem 


741 


dot product 
complex 
derivative identity 
real 


dot product properties 
eigenpolynomial 
eigenspace 


eigenvalue 
algebraic multiplicity 


eigenvalues 

eigenvector 

ellipsoid 

Euclidean n-space 

Euler’s formula 
exponential Maclaurin series 
exponential matrix 

field 

figure-eight knot 

forward substitution 
Fourier series 

fractals 

Frenet—Serret frame 
Probenius norm 
Gauss-Jordan elimination 
gradient 


Gram-Schmidt process 


742 


Green’s theorem 
helix 

image 

inconsistent system 
independent set 


inverse 
function 
matrix 


invertible 

Jacobi identity 
Jordan forms 

kernel 

Lagrange multipliers 
Lagrange polynomial 
law of cosines 
least-squares fit 


Leontief model 
input—output 
open input-output 


LHS (left-hand side) 
line of fit 

line segment 

linear combination 


linear equation 
general 


linear map 


743 


composition of 
image 

kernel 
reflection 
rescaling 
rotation 


linear regression 


linear system 
coefficients 
determined 
general 
homogeneous 
overdetermined 
quare 
square 
underdetermined 


linear transformation 
logarithmic spiral 
LU factorization 
Möbius surface 
magnitude 

Markov chains 


matrix 
additive properties 
antisymmetric 
augmented 
cofactor 
column rank 
column space 
definition of 
determinant 
determinant properties 
diagonal 


744 


difference 
dimensions of 
elementary 
equality 
exponential 
Hermitian 
identity 

integral 

inverse 

inverse properties 
invertible 

lower triangular 
minor 
multiplication 
multiplication properties 
nonsingular 
orthogonal 
product 

rank 

row rank 

row space 
similar 

singular 

size of 

sparse 

sum 

symmetric 

table of properties 
trace 

transpose 
triangular 
unitary 

upper triangular 


matrix of coefficients 
method of integrating factors 
method of least squares 


method of least-squares 


745 


method of separation of variables 
modulus 

Newton-Raphson method 
nonlinear equation 

norm 

normal vector 

objective function 
one-to-one function 

onto function 

orthogonal matrix 
orthogonal matrix properties 
orthogonal subspace 
orthogonal vectors 
osculating circle 

osculating plane 
overdetermined system 
overdetermined systems 
parallelepiped 
parallelogram law 


parametric 


curve 
surface 


perpendicular subspace 


plane 


polygon 


746 


area of 
simple 


position vector field 
prime number theorem 
principal axis theorem 
projection 
pseudoinverse 
pseudosphere 
quadratic forms 
quaternions 

rank 

reflection 

rescaling 

RHS (right-hand side) 
right-hand rule 
rotation 

row matrix 

row rank 

row space 

rref 

scalar 

scalar multiplication 
similar matrices 


simple closed curve 
area enclosed by 


747 


simultaneous solution 


solution 
particular 


spacecurve 
spanning subset 
sphere 

square system 
squared deviation 
standard basis 
subfield 


subspace 
dimension 
image of 
intersection 
inverse image of 
orthogonal 
perpendicular 
sum 
translate 
union 


superfield 


surface 
simple closed 


symmetric matrix 


system 
determined 
overdetermined 
square 
under deter mined 


tangent vector field 


748 


TNB frame 
torsion 
torus 

trace 


transpose 
properties 


triangle inequality 

unit square 

unit tangent vector 
unitary matrix 
variation of parameters 


vector 
addition properties 
component 
cross product properties 
dependent 
underdetermined system 
displacement 
dot product 
dot product properties 
independent 
length 
linear combination 
linear combination of 
linearly dependent 
linearly independent 
magnitude 
norm 
normal 
parallel 
parallelogram law 
position 
projection 


749 


scalar multiplication 
subspace 
unit 


vector addition 
vector additive properties 
vector field 


vector space 


basis 
definition 


definition of C” 
definition of R? 
dimension 
Euclidean 


Wronskian 


zero vector 


Index of Mathematica Commands 


++ 


® 


750 


% 
Abs 
All 


Animation Rate 


Appearance 
ArcCos 
ArcTan 
Array 
Arrow 
Arrowheads 
Aspect Ratio 
Axes 

Axes Edge 
Axes Label 
Axes Origin 
Block 
Boxed 

Box Ratios 
Break 
Ceiling 
Chop 

Circle 

Clear 

Clear All 


751 


Clipping Style 
Coefficient 
Collect 

Complete Square 
Complex Expand 
Conic Prame 
Conic ThroughS Points 
Conjugate 
Constant Array 
Contour Plot 
Contour Plot3D 
Contours 

Contour Style 
Control Placement 
Control Type 

Cos 

Cross 

Cubics 

Degree 

Det 

Det By Gaussian Elimination 
Diagonal Matrix 
Dimensions 


Directive 


752 


Do 

Dot (.) 

Drop 

DSolve 
Eigenvalues 
Eigenvectors 
Epilog 
Evaluate 
Exit[ ] 

e 

Expand 
Factorlnteger 
Flatten 

For 

Frame 
Graphics 
Graphics3D 
Graphics Array 
Li 

Identity Matrix 
If 

Image Size 
Inverse 


Join 


753 


Length 

Line 

List Plot 

Locator 
LUDecomposition 
Manipulate 

Map 

Matrix Exp 
Matrix Form 
Matrix Power 
Max Recursion 
Mesh 

Minor 

Module 

N 
Nest 
Nintegrate 
Norm 
NRoots 
NSolve 


NSum 
Opacity 
Parametric Plot 


Parametric Plot3D 


754 


Pi, x 
Piecewise 
Pivot Down 
Plot3D 

Plot Label 
Plot Markers 
Plot Points 
Plot Range 
Plot Style 
Point 

Point Size 
Polygon 
Polynomial Quotient Remainder 
Position 
Prime Pi 
Print 
Product 
Random Real 
Re 
Rectangle 
Replace Part 
Return 
Roots 


Row 


755 


Row Reduce 
Save Definitions 
Show 
Simplify 

Sin 

Solve 

Sparse Array 
Sphere 

Sqrt 

Sum 

Table 

Text 
Thickness 
Ticks 

To Rules 
Total 
Transpose 
Trig Expand 
Vector Angle 
View Point 
While 
Working Precision 


Wronskian 


756 


PURE AND APPLIED MATHEMATICS 
A Wiley Series of Texts, Monographs, and Tracts 
Founded by RICHARD COURANT 


Editors Emeriti: MYRON B. ALLEN III, DAVID A. COX, PETER 
HILTON, HARRY HOCHSTADT, PETER LAX, JOHN TOLAND 


ADAMEK, HERRLICH, and STRECKER—Abstract and Concrete 
Catetories 


ADAMOWICZ and ZBIERSKI—Logic of Mathematics 


AINSWORTH and ODEN—A Posteriori Error Estimation in Finite 
Element Analysis 


AKIVIS and GOLDBERG—Conformal Differential Geometry and Its 
Generalizations 


ALLEN and ISAACSON—Numerical Analysis for Applied Science 
* ARTIN—Geometric Algebra 


ATKINSON, HAN, and STEWART—Numerical Solution of Ordinary 
Differential Equations 


AUBIN—Applied Functional Analysis, Second Edition 


AZIZOV and IOKHVIDOV—Linear Operators in Spaces with an 
Indefinite Metric 


BASENER—Topology and Its Applications 
BERG—The Fourier-Analytic Proof of Quadratic Reciprocity 
BERKOVITZ—Convexity and Optimization in R” 


BERMAN, NEUMANN, and STERN—Nonnegative Matrices in 
Dynamic Systems 


BOYARINTSEV—Methods of Solving Singular Systems of Ordinary 
Differential Equations 


BRIDGER—Real Analysis: A Constructive Approach 


7157 


BURK—Lebesgue Measure and Integration: An Introduction 
* CARTER—Finite Groups of Lie Type 


CASTILLO, COBO, JUBETE, and PRUNEDA—Orthogonal Sets and 
Polar Methods in Linear Algebra: Applications to Matrix Calculations, 
Systems of Equations, Inequalities, and Linear Programming 


CASTILLO, CONEJO, PEDREGAL, GARCIA, and 
ALGUACIL—Building and Solving Mathematical Programming Models 
in Engineering and Science 


CHATELIN—Eigenvalues of Matrices 


CLARK—Mathematical Bioeconomics: The Mathematics of 
Conservation, Third Edition 


COX—Galois Theory 


+ COX—Primes of the Form x? + ny? : Fermat, Class Field Theory, and 
Complex Multiplication 


* CURTIS and REINER—Representation Theory of Finite Groups and 
Associative Algebras 


* CURTIS and REINER—Methods of Representation Theory: With 
Applications to Finite Groups and Orders, Volume I 


CURTIS and REINER—Methods of Representation Theory: With 
Applications to Finite Groups and Orders, Volume II 


DINCULEANU— Vector Integration and Stochastic Integration in Banach 
Spaces 


* DUNFORD and SCHWARTZ—Linear Operators 
Part 1—General Theory 
Part 2—Spectral Theory, Self Adjoint Operators in Hilbert Space 
Part 3—Spectral Operators 


FARINA and RINALDI—Positive Linear Systems: Theory and 
Applications 


FATICONI—The Mathematics of Infinity: A Guide to Great Ideas 


758 


FOLLAND—Real Analysis: Modern Techniques and Their Applications 
FROLICHER and KRIEGL—Linear Spaces and Differentiation Theory 
GARDINER—Teichmiiller Theory and Quadratic Differentials 


GILBERT and NICHOLSON—Modern Algebra with Applications, 
Second Edition 


* GRIFFITHS and HARRIS—Principles of Algebraic Geometry 
GRILLET—Algebra 
GROVE—Groups and Characters 


GUSTAFSSON, KREISS and OLIGER—Time Dependent Problems and 
Difference Methods 


HANNA and ROWLAND—Fourier Series, Transforms, and Boundary 
Value Problems, Second Edition 


* HENRICI—Applied and Computational Complex Analysis 


Volume 1, Power Series—Integration—Conformal 
Mapping—Location of Zeros 


Volume 2 Special Functions—Integral 
Transforms—Asymptotics—Continued Fractions 


Volume 3, Discrete Fourier Analysis, Cauchy Integrals, 
Construction of Conformal Maps, Univalent Functions 


* HILTON and WU—A Course in Modern Algebra 
* HOCHSTADT—Integral Equations 
JOST—Two-Dimensional Geometric Variational Procedures 


KHAMSI and KIRK—An Introduction to Metric Spaces and Fixed Point 
Theory 


* KOBAYASHI and NOMIZU—Foundations of Differential Geometry, 
Volume I 


* KOBAYASHI and NOMIZU—Foundations of Differential Geometry, 
Volume II 


KOSHY—Fibonacci and Lucas Numbers with Applications 


759 


LAX—Functional Analysis 
LAX—Linear Algebra and Its Applications, Second Edition 


LOGAN—An Introduction to Nonlinear Partial Differential Equations, 
Second Edition 


LOGAN and WOLESENSK Y—Mathematical Methods in Biology 
MARKLEY—Principles of Differential Equations 


MORRISON—Functional Analysis: An Introduction to Banach Space 
Theory 


NAYFEH—Perturbation Methods 


NAYFEH and MOOK—Nonlinear Oscillations 
O’LEARY—Revolutions of Geometry 
O’NEIL—Beginning Partial Differential Equations, Second Edition 


PANDEY—The Hilbert Transform of Schwartz Distributions and 
Applications 


PETKOV—Geometry of Reflecting Rays and Inverse Spectral Problems 
* PRENTER—Splines and Variational Methods 

PROMISLOW—A First Course in Functional Analysis 

RAO—Measure Theory and Integration 


RASSIAS and SIMSA—Finite Sums Decompositions in Mathematical 
Analysis 


RENELT—Elliptic Systems and Quasiconformal Mappings 


RIVLIN—Chebyshev Polynomials: From Approximation Theory to 
Algebra and Number Theory, Second Edition 


ROCKAFELLAR—Network Flows and Monotropic Optimization 
ROITMAN—Introduction to Modern Set Theory 
ROSSI—Theorems, Corollaries, Lemmas, and Methods of Proof 


760 


* RUDIN—Fourier Analysis on Groups 


SENDOV—The Averaged Moduli of Smoothness: Applications in 
Numerical Methods and Approximations 


SENDOV and POPOV—The Averaged Moduli of Smoothness 


SEWELL—tThe Numerical Solution of Ordinary and Partial Differential 
Equations, Second Edition 


SEWELL—Computational Methods of Linear Algebra, Second Edition 
SHICK—Topology: Point-Set and Geometric 


SHISKOWSKI and FRINKLE—Principles of Linear Algebra With 
Maple™ 


SHISKOWSKI and FRINKLE—Principles of Linear Algebra With 
Mathematica® 


* SIEGEL—Topics in Complex Function Theory 
Volume 1—Elliptic Functions and Uniformization Theory 
Volume 2—Automorphic Functions and Abelian Integrals 


Volume 3—Abelian Functions and Modular Functions of Several 
Variables 


SMITH and ROMANOWSKA—Post-Modern Algebra 
SOLIN-Partial Differential Equations and the Finite Element Method 
STADE—Fourier Analysis 

STAHL—Introduction to Topology and Geometry 


STAKGOLD and HOLST—Green’s Functions and Boundary Value 
Problems, Third Edition 


STANOYEVITCH—Introduction to Numerical Ordinary and Partial 
Differential Equations Using MATLAB® 


* STOKER—Differential Geometry 
* STOKER—Nonlinear Vibrations in Mechanical and Electrical Systems 


* STOKER—Water Waves: The Mathematical Theory with Applications 


761 


WATKINS—Fundamentals of Matrix Computations, Third Edition 
WESSELING—An Introduction to Multigrid Methods 
+ WHITHAM—Linear and Nonlinear Waves 


ZAUDERER—Partial Differential Equations of Applied Mathematics, 
Third Edition 


* Now available in a lower priced paperback edition in the Wiley Classics 
Library. 


+ Now available in paperback. 


762 


